Spaces:

Ahmadzei
/

RAG

Runtime error

RAG

File size: 314 Bytes

5fa1a76

While most prior work investigated the use of distillation for building task-specific models, we leverage
knowledge distillation during the pretraining phase and show that it is possible to reduce the size of a BERT model by
40%, while retaining 97% of its language understanding capabilities and being 60% faster.