Ahmadzei's picture
added 3 more tables for large emb model
5fa1a76
As you can see, only 2 inputs are required for the model in order to compute a loss: input_values (which are the
speech inputs) and labels (which are the input_ids of the encoded target sequence).