Ahmadzei's picture
added 3 more tables for large emb model
5fa1a76
raw
history blame contribute delete
377 Bytes
The actual objective is a combination of:
finding the same probabilities as the teacher model
predicting the masked tokens correctly (but no next-sentence objective)
a cosine similarity between the hidden states of the student and the teacher model
Resources
A list of official Hugging Face and community (indicated by 🌎) resources to help you get started with DistilBERT.