Ahmadzei's picture
added 3 more tables for large emb model
5fa1a76
raw
history blame contribute delete
207 Bytes
This model has four main components:
A feature encoder takes the raw audio waveform, normalizes it to zero mean and unit variance, and converts it into a sequence of feature vectors that are each 20ms long.