Ahmadzei's picture
added 3 more tables for large emb model
5fa1a76
ZeRO Inference
ZeRO Inference places the model weights in CPU or NVMe memory to avoid burdening the GPU which makes it possible to run inference with huge models on a GPU.