|
--- |
|
datasets: |
|
- davzoku/moecule-stock-market-outlook |
|
- davzoku/moecule-kyc |
|
base_model: |
|
- unsloth/Llama-3.2-1B-Instruct |
|
pipeline_tag: question-answering |
|
--- |
|
|
|
# 馃珢 Moecule 2x1B M9 KS |
|
|
|
<p align="center"> |
|
<img src="https://cdn-uploads.huggingface.co/production/uploads/63c51d0e72db0f638ff1eb82/8BNZvdKBuSComBepbH-QW.png" width="150" height="150" alt="logo"> <br> |
|
</p> |
|
|
|
## Model Details |
|
|
|
This model is a mixture of experts (MoE) using the [RhuiDih/moetify](https://github.com/RhuiDih/moetify) library with various task-specific experts. All relevant expert models, LoRA adapters, and datasets are available at [Moecule Ingredients](https://huggingface.co/collections/davzoku/moecule-ingredients-67dac0e6210eb1d95abc6411). |
|
|
|
## Key Features |
|
|
|
- **Zero Additional Training:** Combine existing domain-specific / task-specific experts into a powerful MoE model without additional training! |
|
|
|
## System Requirements |
|
|
|
| Steps | System Requirements | |
|
| ---------------- | ---------------------- | |
|
| MoE Creation | > 22.5 GB System RAM | |
|
| Inference (fp16) | GPU with > 5.4 GB VRAM | |
|
|
|
## MoE Creation |
|
|
|
To reproduce this model, run the following command: |
|
|
|
```shell |
|
# git clone moetify fork that fixes dependency issue |
|
!git clone -b fix-transformers-4.47.1-FlashA2-dependency --single-branch https://github.com/davzoku/moetify.git |
|
|
|
!cd moetify && pip install -e . |
|
|
|
python -m moetify.mix \ |
|
--output_dir ./moecule-2x1b-m9-ks \ |
|
--model_path unsloth/llama-3.2-1b-Instruct \ |
|
--modules mlp q_proj \ |
|
--ingredients \ |
|
davzoku/kyc_expert_1b \ |
|
davzoku/stock_market_expert_1b |
|
``` |
|
|
|
## Model Parameters |
|
|
|
```shell |
|
INFO:root:Stem parameters: 626067456 |
|
INFO:root:Experts parameters: 1744830464 |
|
INFO:root:Routers parameters: 131072 |
|
INFO:root:MOE total parameters (numel): 2371028992 |
|
INFO:root:MOE total parameters : 2371028992 |
|
INFO:root:MOE active parameters: 2371028992 |
|
``` |
|
|
|
## Inference |
|
|
|
To run an inference with this model, you can use the following code snippet: |
|
|
|
```python |
|
# git clone moetify fork that fixes dependency issue |
|
!git clone -b fix-transformers-4.47.1-FlashA2-dependency --single-branch https://github.com/davzoku/moetify.git |
|
|
|
!cd moetify && pip install -e . |
|
|
|
model = AutoModelForCausalLM.from_pretrained(<model-name>, device_map='auto', trust_remote_code=True) |
|
tokenizer = AutoTokenizer.from_pretrained(<model-name>) |
|
|
|
def format_instruction(row): |
|
return f"""### Question: {row}""" |
|
|
|
greedy_generation_config = GenerationConfig( |
|
temperature=0.1, |
|
top_p=0.75, |
|
top_k=40, |
|
num_beams=1, |
|
max_new_tokens=128, |
|
repetition_penalty=1.2 |
|
) |
|
|
|
|
|
input_text = "In what ways did Siemens's debt restructuring on March 06, 2024 reflect its strategic priorities?" |
|
formatted_input = format_instruction(input_text) |
|
inputs = tokenizer(formatted_input, return_tensors="pt").to('cuda') |
|
|
|
with torch.no_grad(): |
|
outputs = model.generate( |
|
input_ids=inputs.input_ids, |
|
attention_mask=inputs.attention_mask, |
|
generation_config=greedy_generation_config |
|
) |
|
|
|
generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True) |
|
print(generated_text) |
|
``` |
|
|
|
## The Team |
|
|
|
- CHOCK Wan Kee |
|
- Farlin Deva Binusha DEVASUGIN MERLISUGITHA |
|
- GOH Bao Sheng |
|
- Jessica LEK Si Jia |
|
- Sinha KHUSHI |
|
- TENG Kok Wai (Walter) |
|
|
|
## References |
|
|
|
- [Flexible and Effective Mixing of Large Language Models into a Mixture of Domain Experts](https://arxiv.org/abs/2408.17280v2) |
|
- [RhuiDih/moetify](https://github.com/RhuiDih/moetify) |
|
|