Ahmadzei's picture
added 3 more tables for large emb model
5fa1a76
The model is first created on the Meta device (with empty weights) and the state dict is then loaded inside it (shard by shard in the case of a sharded checkpoint).