File size: 207 Bytes
5fa1a76
 
 
1
2
3
This model has four main components:

A feature encoder takes the raw audio waveform, normalizes it to zero mean and unit variance, and converts it into a sequence of feature vectors that are each 20ms long.