|
--- |
|
license: apache-2.0 |
|
datasets: |
|
- liuhaotian/LLaVA-Instruct-150K |
|
language: |
|
- en |
|
base_model: |
|
- microsoft/Phi-3.5-mini-instruct |
|
pipeline_tag: text-generation |
|
--- |
|
|
|
π CompeteSMoE-5.1B |
|
|
|
CompeteSMoE-5.1B is a lightweight and integrated variant of the Mixture-of-Experts (MoE) architecture, built upon the Phi-3.5 Mini and SigLIP baselines. This version incorporates the latest CompeteSMoE algorithm enhancements. CompeteSMoE-5.1B demonstrates strong performance across a range of MoE routing strategies, including both standard and star-to-art routing methods. It achieves competitive results compared to recent MoE architectures, such as SharedE-V2 and SharedE-V3, which are inspired by DeepSeek. Despite the architectural innovations of these models especially their use of shared experts CompeteSMoE-5.1B consistently delivers superior or comparable results. |
|
|
|
π Note: This version of CompeteSMoE-5.1B was trained on a small-scale dataset. π§ We're actively working on a stronger, more robust release β coming soon! π Stay tuned for updates. π‘ |
|
|
|
### Hardware Resources |
|
|
|
| Stage | MoE Method | Hardware | |
|
|-------------------|----------------------|-----------| |
|
| Pre-Training | | 4xH100 | |
|
| Pre-FineTuning | | 4xH100 | |
|
| VIT | CompeteSMoE | 4xH100 | |
|
|
|
--- |
|
|
|
### Citation Information |
|
More details can be found in our paper. |
|
|
|
If you use CompeteSMoE, please cite it using this BibTeX: |
|
|
|
``` |
|
@misc{nguyen2025competesmoe, |
|
title={CompeteSMoE -- Statistically Guaranteed Mixture of Experts Training via Competition}, |
|
author={Nam V. Nguyen and Huy Nguyen and Quang Pham and Van Nguyen and Savitha Ramasamy and Nhat Ho}, |
|
year={2025}, |
|
eprint={2505.13380}, |
|
archivePrefix={arXiv}, |
|
primaryClass={cs.AI} |
|
} |
|
``` |
|
|