Update README.md
Browse files
README.md
CHANGED
@@ -15,15 +15,15 @@ language:
|
|
15 |
- km
|
16 |
- ta
|
17 |
---
|
18 |
-
# SEA-LION-7B-
|
19 |
|
20 |
SEA-LION is a collection of Large Language Models (LLMs) which has been pretrained and instruct-tuned for the Southeast Asia (SEA) region.
|
21 |
The size of the models range from 3 billion to 7 billion parameters.
|
22 |
This is the card for the SEA-LION 7B Instruct (Non-Commercial) model.
|
23 |
|
24 |
-
For more details on the base model, please refer to the [base model's model card](https://huggingface.co/aisingapore/
|
25 |
|
26 |
-
For the commercially permissive model, please refer to the [SEA-LION-7B-
|
27 |
|
28 |
SEA-LION stands for <i>Southeast Asian Languages In One Network</i>.
|
29 |
|
@@ -49,9 +49,9 @@ The model was then further instruction-tuned on <b>Indonesian data only</b>.
|
|
49 |
|
50 |
### Benchmark Performance
|
51 |
|
52 |
-
SEA-LION-7B-
|
53 |
|
54 |
-
We evaluated SEA-LION-7B-
|
55 |
compared it against [Llama-2-7B](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf), [Mistral-7B](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1)
|
56 |
and [Falcon-7B](https://huggingface.co/tiiuae/falcon-7b-instruct).
|
57 |
|
@@ -69,8 +69,8 @@ For Natural Language Reasoning (NLR) tasks, we tested the model on Natural Langu
|
|
69 |
|
70 |
| Model | QA (F1) | Sentiment (F1) | Toxicity (F1) | Eng>Indo (ChrF++) | Indo>Eng (ChrF++) | Summary (ROUGE-L) | NLI (Acc) | Causal (Acc) |
|
71 |
|--------------------------------|---------|----------------|---------------|-------------------|-------------------|-------------------|-----------|--------------|
|
72 |
-
| SEA-LION-7B-
|
73 |
-
| SEA-LION-7B-
|
74 |
| SeaLLM 7B v1 | 30.96 | 56.29 | 22.60 | 62.23 | 41.55 | 14.03 | 26.50 | 56.60 |
|
75 |
| SeaLLM 7B v2 | 44.40 | 80.13 | **55.24** | 64.01 | **63.28** | 17.31 | 43.60 | 82.00 |
|
76 |
| Sailor-7B (Base) | 65.43 | 59.48 | 20.48 | **64.27** | 60.68 | 8.69 | 15.10 | 38.40 |
|
@@ -83,9 +83,9 @@ For Natural Language Reasoning (NLR) tasks, we tested the model on Natural Langu
|
|
83 |
|
84 |
### Model Architecture and Objective
|
85 |
|
86 |
-
SEA-LION is a decoder model using the MPT architecture.
|
87 |
|
88 |
-
| Parameter | SEA-LION
|
89 |
|-----------------|:-----------:|
|
90 |
| Layers | 32 |
|
91 |
| d_model | 4096 |
|
@@ -107,8 +107,8 @@ The tokenizer type is Byte-Pair Encoding (BPE).
|
|
107 |
|
108 |
from transformers import AutoModelForCausalLM, AutoTokenizer
|
109 |
|
110 |
-
tokenizer = AutoTokenizer.from_pretrained("aisingapore/
|
111 |
-
model = AutoModelForCausalLM.from_pretrained("aisingapore/
|
112 |
|
113 |
prompt_template = "### USER:\n{human_prompt}\n\n### RESPONSE:\n"
|
114 |
prompt = """Apa sentimen dari kalimat berikut ini?
|
|
|
15 |
- km
|
16 |
- ta
|
17 |
---
|
18 |
+
# SEA-LION-7B-IT-Research
|
19 |
|
20 |
SEA-LION is a collection of Large Language Models (LLMs) which has been pretrained and instruct-tuned for the Southeast Asia (SEA) region.
|
21 |
The size of the models range from 3 billion to 7 billion parameters.
|
22 |
This is the card for the SEA-LION 7B Instruct (Non-Commercial) model.
|
23 |
|
24 |
+
For more details on the base model, please refer to the [base model's model card](https://huggingface.co/aisingapore/SEA-LION-v1-7B).
|
25 |
|
26 |
+
For the commercially permissive model, please refer to the [SEA-LION-7B-IT](https://huggingface.co/aisingapore/SEA-LION-v1-7B-IT).
|
27 |
|
28 |
SEA-LION stands for <i>Southeast Asian Languages In One Network</i>.
|
29 |
|
|
|
49 |
|
50 |
### Benchmark Performance
|
51 |
|
52 |
+
SEA-LION-7B-IT-Research performs better than other models of comparable size when tested on tasks in the Indonesian language.
|
53 |
|
54 |
+
We evaluated SEA-LION-7B-IT-Research on the [BHASA benchmark](https://arxiv.org/abs/2309.06085) and
|
55 |
compared it against [Llama-2-7B](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf), [Mistral-7B](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1)
|
56 |
and [Falcon-7B](https://huggingface.co/tiiuae/falcon-7b-instruct).
|
57 |
|
|
|
69 |
|
70 |
| Model | QA (F1) | Sentiment (F1) | Toxicity (F1) | Eng>Indo (ChrF++) | Indo>Eng (ChrF++) | Summary (ROUGE-L) | NLI (Acc) | Causal (Acc) |
|
71 |
|--------------------------------|---------|----------------|---------------|-------------------|-------------------|-------------------|-----------|--------------|
|
72 |
+
| SEA-LION-7B-IT-Research | 24.86 | 76.13 | 24.45 | 52.50 | 46.82 | 15.44 | 33.20 | 23.80 |
|
73 |
+
| SEA-LION-7B-IT | **68.41**| **91.45** | 17.98 | 57.48 | 58.04 | **17.54** | 53.10 | 60.80 |
|
74 |
| SeaLLM 7B v1 | 30.96 | 56.29 | 22.60 | 62.23 | 41.55 | 14.03 | 26.50 | 56.60 |
|
75 |
| SeaLLM 7B v2 | 44.40 | 80.13 | **55.24** | 64.01 | **63.28** | 17.31 | 43.60 | 82.00 |
|
76 |
| Sailor-7B (Base) | 65.43 | 59.48 | 20.48 | **64.27** | 60.68 | 8.69 | 15.10 | 38.40 |
|
|
|
83 |
|
84 |
### Model Architecture and Objective
|
85 |
|
86 |
+
SEA-LION-7B-IT-Research is a decoder model using the MPT architecture.
|
87 |
|
88 |
+
| Parameter | SEA-LION-7B-IT-Research |
|
89 |
|-----------------|:-----------:|
|
90 |
| Layers | 32 |
|
91 |
| d_model | 4096 |
|
|
|
107 |
|
108 |
from transformers import AutoModelForCausalLM, AutoTokenizer
|
109 |
|
110 |
+
tokenizer = AutoTokenizer.from_pretrained("aisingapore/SEA-LION-v1-7B-IT-Research", trust_remote_code=True)
|
111 |
+
model = AutoModelForCausalLM.from_pretrained("aisingapore/SEA-LION-v1-7B-IT-Research", trust_remote_code=True)
|
112 |
|
113 |
prompt_template = "### USER:\n{human_prompt}\n\n### RESPONSE:\n"
|
114 |
prompt = """Apa sentimen dari kalimat berikut ini?
|