Can Deepspeed ZeRO3 be used for Sharding with Whisper-large-v3?
#204
by
PhoenixAxis
- opened
I am designing a large model that accepts speech input, and the encoder part of whisper-large-v3 is the audio encoder I am using. My original base LLM was 7B in size, but after switching to 32B, the GPU memory pressure became very high, so I want to use ZeRO3. However, I encountered many errors while applying ZeRO3, and I would like to know if whisper-large-v3 can be used with ZeRO3.