Spaces:

Ahmadzei
/

RAG

Runtime error

added 3 more tables for large emb model

5fa1a76 over 1 year ago

334 Bytes

	ZeRO works in several stages:

	ZeRO-1, optimizer state partioning across GPUs
	ZeRO-2, gradient partitioning across GPUs
	ZeRO-3, parameteter partitioning across GPUs

	In GPU-limited environments, ZeRO also enables offloading optimizer memory and computation from the GPU to the CPU to fit and train really large models on a single GPU.