Question on converting this to DeepSeekV2ForCausalLM

by michaelfeil - opened Jun 10

Jun 10

I want to use this model to run tests for the larger deepseek-v3 architectures, primarily doing research on faster kernels for FlashMLA et al. This sounds like a good model, as its takes much shorter to load and still produces coherent output.

Any way this can be loaded into a non-LlamaForCausalLM architecture, e.g. in DeepSeekV2ForCausalLM

BarraHome

Owner Jun 11

•

edited Jun 11

@michaelfeil feel free to checkout this updated version of the model llama3_2-1B-deepseek.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment