Fsoft-AIC
/

CompeteSMoE-5.1B

Text Generation

Model card Files Files and versions

CompeteSMoE-5.1B / README.md

DavidNguyen's picture

Update README.md

c0f4154 verified 3 months ago

|

history blame contribute delete

1.84 kB

	---
	license: apache-2.0
	datasets:
	- liuhaotian/LLaVA-Instruct-150K
	language:
	- en
	base_model:
	- microsoft/Phi-3.5-mini-instruct
	pipeline_tag: text-generation
	---

	🎉 CompeteSMoE-5.1B

	CompeteSMoE-5.1B is a lightweight and integrated variant of the Mixture-of-Experts (MoE) architecture, built upon the Phi-3.5 Mini and SigLIP baselines. This version incorporates the latest CompeteSMoE algorithm enhancements. CompeteSMoE-5.1B demonstrates strong performance across a range of MoE routing strategies, including both standard and star-to-art routing methods. It achieves competitive results compared to recent MoE architectures, such as SharedE-V2 and SharedE-V3, which are inspired by DeepSeek. Despite the architectural innovations of these models especially their use of shared experts CompeteSMoE-5.1B consistently delivers superior or comparable results.

	📝 Note: This version of CompeteSMoE-5.1B was trained on a small-scale dataset. 🚧 We're actively working on a stronger, more robust release — coming soon! 🚀 Stay tuned for updates. 💡

	### Hardware Resources

	\| Stage \| MoE Method \| Hardware \|
	\|-------------------\|----------------------\|-----------\|
	\| Pre-Training \| \| 4xH100 \|
	\| Pre-FineTuning \| \| 4xH100 \|
	\| VIT \| CompeteSMoE \| 4xH100 \|

	---

	### Citation Information
	More details can be found in our paper.

	If you use CompeteSMoE, please cite it using this BibTeX:

	```
	@misc{nguyen2025competesmoe,
	title={CompeteSMoE -- Statistically Guaranteed Mixture of Experts Training via Competition},
	author={Nam V. Nguyen and Huy Nguyen and Quang Pham and Van Nguyen and Savitha Ramasamy and Nhat Ho},
	year={2025},
	eprint={2505.13380},
	archivePrefix={arXiv},
	primaryClass={cs.AI}
	}
	```