Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,114 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
datasets:
|
3 |
+
- davzoku/moecule-stock-market-outlook
|
4 |
+
- davzoku/moecule-kyc
|
5 |
+
base_model:
|
6 |
+
- unsloth/Llama-3.2-1B-Instruct
|
7 |
+
pipeline_tag: question-answering
|
8 |
+
---
|
9 |
+
|
10 |
+
# 馃珢 Moecule 2x1B M9 KS
|
11 |
+
|
12 |
+
<p align="center">
|
13 |
+
<img src="https://cdn-uploads.huggingface.co/production/uploads/63c51d0e72db0f638ff1eb82/8BNZvdKBuSComBepbH-QW.png" width="150" height="150" alt="logo"> <br>
|
14 |
+
</p>
|
15 |
+
|
16 |
+
## Model Details
|
17 |
+
|
18 |
+
This model is a mixture of experts (MoE) using the [RhuiDih/moetify](https://github.com/RhuiDih/moetify) library with various task-specific experts. All relevant expert models, LoRA adapters, and datasets are available at [Moecule Ingredients](https://huggingface.co/collections/davzoku/moecule-ingredients-67dac0e6210eb1d95abc6411).
|
19 |
+
|
20 |
+
## Key Features
|
21 |
+
|
22 |
+
- **Zero Additional Training:** Combine existing domain-specific / task-specific experts into a powerful MoE model without additional training!
|
23 |
+
|
24 |
+
## System Requirements
|
25 |
+
|
26 |
+
| Steps | System Requirements |
|
27 |
+
| ---------------- | ---------------------- |
|
28 |
+
| MoE Creation | > 22.5 GB System RAM |
|
29 |
+
| Inference (fp16) | GPU with > 5.4 GB VRAM |
|
30 |
+
|
31 |
+
## MoE Creation
|
32 |
+
|
33 |
+
To reproduce this model, run the following command:
|
34 |
+
|
35 |
+
```shell
|
36 |
+
# git clone moetify fork that fixes dependency issue
|
37 |
+
!git clone -b fix-transformers-4.47.1-FlashA2-dependency --single-branch https://github.com/davzoku/moetify.git
|
38 |
+
|
39 |
+
!cd moetify && pip install -e .
|
40 |
+
|
41 |
+
python -m moetify.mix \
|
42 |
+
--output_dir ./moecule-2x1b-m9-ks \
|
43 |
+
--model_path unsloth/llama-3.2-1b-Instruct \
|
44 |
+
--modules mlp q_proj \
|
45 |
+
--ingredients \
|
46 |
+
davzoku/kyc_expert_1b \
|
47 |
+
davzoku/stock_market_expert_1b
|
48 |
+
```
|
49 |
+
|
50 |
+
## Model Parameters
|
51 |
+
|
52 |
+
```shell
|
53 |
+
INFO:root:Stem parameters: 626067456
|
54 |
+
INFO:root:Experts parameters: 1744830464
|
55 |
+
INFO:root:Routers parameters: 131072
|
56 |
+
INFO:root:MOE total parameters (numel): 2371028992
|
57 |
+
INFO:root:MOE total parameters : 2371028992
|
58 |
+
INFO:root:MOE active parameters: 2371028992
|
59 |
+
```
|
60 |
+
|
61 |
+
## Inference
|
62 |
+
|
63 |
+
To run an inference with this model, you can use the following code snippet:
|
64 |
+
|
65 |
+
```python
|
66 |
+
# git clone moetify fork that fixes dependency issue
|
67 |
+
!git clone -b fix-transformers-4.47.1-FlashA2-dependency --single-branch https://github.com/davzoku/moetify.git
|
68 |
+
|
69 |
+
!cd moetify && pip install -e .
|
70 |
+
|
71 |
+
model = AutoModelForCausalLM.from_pretrained(<model-name>, device_map='auto', trust_remote_code=True)
|
72 |
+
tokenizer = AutoTokenizer.from_pretrained(<model-name>)
|
73 |
+
|
74 |
+
def format_instruction(row):
|
75 |
+
return f"""### Question: {row}"""
|
76 |
+
|
77 |
+
greedy_generation_config = GenerationConfig(
|
78 |
+
temperature=0.1,
|
79 |
+
top_p=0.75,
|
80 |
+
top_k=40,
|
81 |
+
num_beams=1,
|
82 |
+
max_new_tokens=128,
|
83 |
+
repetition_penalty=1.2
|
84 |
+
)
|
85 |
+
|
86 |
+
|
87 |
+
input_text = "In what ways did Siemens's debt restructuring on March 06, 2024 reflect its strategic priorities?"
|
88 |
+
formatted_input = format_instruction(input_text)
|
89 |
+
inputs = tokenizer(formatted_input, return_tensors="pt").to('cuda')
|
90 |
+
|
91 |
+
with torch.no_grad():
|
92 |
+
outputs = model.generate(
|
93 |
+
input_ids=inputs.input_ids,
|
94 |
+
attention_mask=inputs.attention_mask,
|
95 |
+
generation_config=greedy_generation_config
|
96 |
+
)
|
97 |
+
|
98 |
+
generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
|
99 |
+
print(generated_text)
|
100 |
+
```
|
101 |
+
|
102 |
+
## The Team
|
103 |
+
|
104 |
+
- CHOCK Wan Kee
|
105 |
+
- Farlin Deva Binusha DEVASUGIN MERLISUGITHA
|
106 |
+
- GOH Bao Sheng
|
107 |
+
- Jessica LEK Si Jia
|
108 |
+
- Sinha KHUSHI
|
109 |
+
- TENG Kok Wai (Walter)
|
110 |
+
|
111 |
+
## References
|
112 |
+
|
113 |
+
- [Flexible and Effective Mixing of Large Language Models into a Mixture of Domain Experts](https://arxiv.org/abs/2408.17280v2)
|
114 |
+
- [RhuiDih/moetify](https://github.com/RhuiDih/moetify)
|