davzoku commited on
Commit
0f0c243
verified
1 Parent(s): e7e5f4f

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +114 -0
README.md ADDED
@@ -0,0 +1,114 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ datasets:
3
+ - davzoku/moecule-stock-market-outlook
4
+ - davzoku/moecule-kyc
5
+ base_model:
6
+ - unsloth/Llama-3.2-1B-Instruct
7
+ pipeline_tag: question-answering
8
+ ---
9
+
10
+ # 馃珢 Moecule 2x1B M9 KS
11
+
12
+ <p align="center">
13
+ <img src="https://cdn-uploads.huggingface.co/production/uploads/63c51d0e72db0f638ff1eb82/8BNZvdKBuSComBepbH-QW.png" width="150" height="150" alt="logo"> <br>
14
+ </p>
15
+
16
+ ## Model Details
17
+
18
+ This model is a mixture of experts (MoE) using the [RhuiDih/moetify](https://github.com/RhuiDih/moetify) library with various task-specific experts. All relevant expert models, LoRA adapters, and datasets are available at [Moecule Ingredients](https://huggingface.co/collections/davzoku/moecule-ingredients-67dac0e6210eb1d95abc6411).
19
+
20
+ ## Key Features
21
+
22
+ - **Zero Additional Training:** Combine existing domain-specific / task-specific experts into a powerful MoE model without additional training!
23
+
24
+ ## System Requirements
25
+
26
+ | Steps | System Requirements |
27
+ | ---------------- | ---------------------- |
28
+ | MoE Creation | > 22.5 GB System RAM |
29
+ | Inference (fp16) | GPU with > 5.4 GB VRAM |
30
+
31
+ ## MoE Creation
32
+
33
+ To reproduce this model, run the following command:
34
+
35
+ ```shell
36
+ # git clone moetify fork that fixes dependency issue
37
+ !git clone -b fix-transformers-4.47.1-FlashA2-dependency --single-branch https://github.com/davzoku/moetify.git
38
+
39
+ !cd moetify && pip install -e .
40
+
41
+ python -m moetify.mix \
42
+ --output_dir ./moecule-2x1b-m9-ks \
43
+ --model_path unsloth/llama-3.2-1b-Instruct \
44
+ --modules mlp q_proj \
45
+ --ingredients \
46
+ davzoku/kyc_expert_1b \
47
+ davzoku/stock_market_expert_1b
48
+ ```
49
+
50
+ ## Model Parameters
51
+
52
+ ```shell
53
+ INFO:root:Stem parameters: 626067456
54
+ INFO:root:Experts parameters: 1744830464
55
+ INFO:root:Routers parameters: 131072
56
+ INFO:root:MOE total parameters (numel): 2371028992
57
+ INFO:root:MOE total parameters : 2371028992
58
+ INFO:root:MOE active parameters: 2371028992
59
+ ```
60
+
61
+ ## Inference
62
+
63
+ To run an inference with this model, you can use the following code snippet:
64
+
65
+ ```python
66
+ # git clone moetify fork that fixes dependency issue
67
+ !git clone -b fix-transformers-4.47.1-FlashA2-dependency --single-branch https://github.com/davzoku/moetify.git
68
+
69
+ !cd moetify && pip install -e .
70
+
71
+ model = AutoModelForCausalLM.from_pretrained(<model-name>, device_map='auto', trust_remote_code=True)
72
+ tokenizer = AutoTokenizer.from_pretrained(<model-name>)
73
+
74
+ def format_instruction(row):
75
+ return f"""### Question: {row}"""
76
+
77
+ greedy_generation_config = GenerationConfig(
78
+ temperature=0.1,
79
+ top_p=0.75,
80
+ top_k=40,
81
+ num_beams=1,
82
+ max_new_tokens=128,
83
+ repetition_penalty=1.2
84
+ )
85
+
86
+
87
+ input_text = "In what ways did Siemens's debt restructuring on March 06, 2024 reflect its strategic priorities?"
88
+ formatted_input = format_instruction(input_text)
89
+ inputs = tokenizer(formatted_input, return_tensors="pt").to('cuda')
90
+
91
+ with torch.no_grad():
92
+ outputs = model.generate(
93
+ input_ids=inputs.input_ids,
94
+ attention_mask=inputs.attention_mask,
95
+ generation_config=greedy_generation_config
96
+ )
97
+
98
+ generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
99
+ print(generated_text)
100
+ ```
101
+
102
+ ## The Team
103
+
104
+ - CHOCK Wan Kee
105
+ - Farlin Deva Binusha DEVASUGIN MERLISUGITHA
106
+ - GOH Bao Sheng
107
+ - Jessica LEK Si Jia
108
+ - Sinha KHUSHI
109
+ - TENG Kok Wai (Walter)
110
+
111
+ ## References
112
+
113
+ - [Flexible and Effective Mixing of Large Language Models into a Mixture of Domain Experts](https://arxiv.org/abs/2408.17280v2)
114
+ - [RhuiDih/moetify](https://github.com/RhuiDih/moetify)