Question on converting this to DeepSeekV2ForCausalLM
#2
by
michaelfeil
- opened
I want to use this model to run tests for the larger deepseek-v3 architectures, primarily doing research on faster kernels for FlashMLA et al. This sounds like a good model, as its takes much shorter to load and still produces coherent output.
Any way this can be loaded into a non-LlamaForCausalLM
architecture, e.g. in DeepSeekV2ForCausalLM