Ahmadzei's picture
added 3 more tables for large emb model
5fa1a76
This is because for very large models, it isn't possible to load the weights on one GPU and then distribute them across the other GPUs due to memory limitations.