19 3 24

Mariusz Kurman PRO

mkurman

AI & ML interests

AI Tech Lead | MD

Recent Activity

liked a dataset 5 days ago

FreedomIntelligence/medical-o1-reasoning-SFT

liked a dataset 5 days ago

ThinkAgents/Function-Calling-with-Chain-of-Thoughts

liked a dataset 25 days ago

MaziyarPanahi/Llama-Nemotron-Post-Training-Dataset-v1-ShareGPT

View all activity

Organizations

Posts 17

Post

563

I feel like it's going to take me forever

meditsolutions/medit-one-140M-9B-tokens-checkpoint

Post

858

Just released NVAMP Loss!

✔️ modification of the cross-entropy loss function designed specifically for training LLMs.
✔️ twist on the standard cross-entropy loss by emphasizing the importance of outlier prediction errors and dynamically normalizing token-level variance.
✔️ more stable and efficient training, leading to models that generalize better.

Check it out, give it a spin, and let me know what you think!

Licensed under the Apache 2.0 license and ready to use. Happy training! 🔥🤖

https://github.com/mkurman/nvamp-loss

View all Posts

spaces 1

Sleeping

Llama 3.2 SUN 2.5B Chat

💬

You can try MedIT Solutions latest release of SUN 2.5B Llama

models 6

Mariusz Kurman PRO

AI & ML interests

Recent Activity

Organizations

Posts 17

spaces 1

Llama 3.2 SUN 2.5B Chat

models 6

mkurman/llama-3.2-MEDIT-3B-o1-GRPO-LLM-Eval

mkurman/Llama-3.2-MedIT-3B-R1

mkurman/Llama-3.2-MedIT-SUN-2.5B-BT-GRPO

mkurman/Qwen2.5-14B-DeepSeek-R1-1M

mkurman/phi4-MedIT-10B-o1

mkurman/llama-3.2-MEDIT-3B-o1

datasets 1

mkurman/simplescaling-s1K-R1

Mariusz Kurman PRO

AI & ML interests

Recent Activity

Organizations

Posts 17

spaces 1

Llama 3.2 SUN 2.5B Chat

models 6 Sort: Recently updated

datasets 1

models 6