Spaces:

Ahmadzei
/

RAG

Runtime error

added 3 more tables for large emb model

5fa1a76 over 1 year ago

314 Bytes

	While most prior work investigated the use of distillation for building task-specific models, we leverage
	knowledge distillation during the pretraining phase and show that it is possible to reduce the size of a BERT model by
	40%, while retaining 97% of its language understanding capabilities and being 60% faster.