Ahmadzei's picture
added 3 more tables for large emb model
5fa1a76
pass self.training to PyTorch's functional dropout
The best way to fix the problem is usually to look at the forward pass of the original implementation and the 🤗
Transformers implementation side-by-side and check if there are any differences.