Ahmadzei's picture
added 3 more tables for large emb model
5fa1a76
It is available in several ZeRO stages, where each stage progressively saves more GPU memory by partitioning the optimizer state, gradients, parameters, and enabling offloading to a CPU or NVMe.