Can Deepspeed ZeRO3 be used for Sharding with Whisper-large-v3?

#204
by PhoenixAxis - opened

I am designing a large model that accepts speech input, and the encoder part of whisper-large-v3 is the audio encoder I am using. My original base LLM was 7B in size, but after switching to 32B, the GPU memory pressure became very high, so I want to use ZeRO3. However, I encountered many errors while applying ZeRO3, and I would like to know if whisper-large-v3 can be used with ZeRO3.

Sign up or log in to comment