chtxxxxx commited on
Commit
4f39f8f
·
verified ·
1 Parent(s): a445b1b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +11 -11
README.md CHANGED
@@ -1,5 +1,5 @@
1
  ---
2
- new_version: aisingapore/gemma2-9b-cpt-sea-lionv3-base
3
  license: mit
4
  language:
5
  - en
@@ -14,7 +14,7 @@ language:
14
  - km
15
  - lo
16
  ---
17
- # SEA-LION
18
 
19
  SEA-LION is a collection of Large Language Models (LLMs) which has been pretrained and instruct-tuned for the Southeast Asia (SEA) region.
20
  The size of the models range from 3 billion to 7 billion parameters.
@@ -30,11 +30,11 @@ SEA-LION stands for <i>Southeast Asian Languages In One Network</i>.
30
  The SEA-LION model is a significant leap forward in the field of Natural Language Processing,
31
  specifically trained to understand the SEA regional context.
32
 
33
- SEA-LION is built on the robust MPT architecture and has a vocabulary size of 256K.
34
 
35
  For tokenization, the model employs our custom SEABPETokenizer, which is specially tailored for SEA languages, ensuring optimal model performance.
36
 
37
- The training data for SEA-LION encompasses 980B tokens.
38
 
39
  - **Developed by:** Products Pillar, AI Singapore
40
  - **Funded by:** Singapore NRF
@@ -44,7 +44,7 @@ The training data for SEA-LION encompasses 980B tokens.
44
 
45
  ### Performance Benchmarks
46
 
47
- SEA-LION has an average performance on general tasks in English (as measured by Hugging Face's LLM Leaderboard):
48
 
49
  | Model | ARC | HellaSwag | MMLU | TruthfulQA | Average |
50
  |-------------|:-----:|:---------:|:-----:|:----------:|:-------:|
@@ -54,7 +54,7 @@ SEA-LION has an average performance on general tasks in English (as measured by
54
 
55
  ### Data
56
 
57
- SEA-LION was trained on 980B tokens of the following data:
58
 
59
  | Data Source | Unique Tokens | Multiplier | Total Tokens | Percentage |
60
  |---------------------------|:-------------:|:----------:|:------------:|:----------:|
@@ -80,10 +80,10 @@ SEA-LION was trained on 980B tokens of the following data:
80
 
81
  ### Infrastructure
82
 
83
- SEA-LION was trained using [MosaicML Composer](https://github.com/mosaicml/composer)
84
  on the following hardware:
85
 
86
- | Training Details | SEA-LION 7B |
87
  |----------------------|:------------:|
88
  | AWS EC2 p4d.24xlarge | 32 instances |
89
  | Nvidia A100 40GB GPU | 256 |
@@ -92,7 +92,7 @@ on the following hardware:
92
 
93
  ### Configuration
94
 
95
- | HyperParameter | SEA-LION 7B |
96
  |-------------------|:------------------:|
97
  | Precision | bfloat16 |
98
  | Optimizer | decoupled_adamw |
@@ -106,9 +106,9 @@ on the following hardware:
106
 
107
  ### Model Architecture and Objective
108
 
109
- SEA-LION is a decoder model using the MPT architecture.
110
 
111
- | Parameter | SEA-LION 7B |
112
  |-----------------|:-----------:|
113
  | Layers | 32 |
114
  | d_model | 4096 |
 
1
  ---
2
+ new_version: aisingapore/Gemma-SEA-LION-v3-9B
3
  license: mit
4
  language:
5
  - en
 
14
  - km
15
  - lo
16
  ---
17
+ # SEA-LION-v1-7B
18
 
19
  SEA-LION is a collection of Large Language Models (LLMs) which has been pretrained and instruct-tuned for the Southeast Asia (SEA) region.
20
  The size of the models range from 3 billion to 7 billion parameters.
 
30
  The SEA-LION model is a significant leap forward in the field of Natural Language Processing,
31
  specifically trained to understand the SEA regional context.
32
 
33
+ SEA-LION-v1-7B is built on the robust MPT architecture and has a vocabulary size of 256K.
34
 
35
  For tokenization, the model employs our custom SEABPETokenizer, which is specially tailored for SEA languages, ensuring optimal model performance.
36
 
37
+ The training data for SEA-LION-v1-7B encompasses 980B tokens.
38
 
39
  - **Developed by:** Products Pillar, AI Singapore
40
  - **Funded by:** Singapore NRF
 
44
 
45
  ### Performance Benchmarks
46
 
47
+ SEA-LION-v1-7B has an average performance on general tasks in English (as measured by Hugging Face's LLM Leaderboard):
48
 
49
  | Model | ARC | HellaSwag | MMLU | TruthfulQA | Average |
50
  |-------------|:-----:|:---------:|:-----:|:----------:|:-------:|
 
54
 
55
  ### Data
56
 
57
+ SEA-LION-v1-7B was trained on 980B tokens of the following data:
58
 
59
  | Data Source | Unique Tokens | Multiplier | Total Tokens | Percentage |
60
  |---------------------------|:-------------:|:----------:|:------------:|:----------:|
 
80
 
81
  ### Infrastructure
82
 
83
+ SEA-LION-v1-7B was trained using [MosaicML Composer](https://github.com/mosaicml/composer)
84
  on the following hardware:
85
 
86
+ | Training Details | SEA-LION-v1-7B |
87
  |----------------------|:------------:|
88
  | AWS EC2 p4d.24xlarge | 32 instances |
89
  | Nvidia A100 40GB GPU | 256 |
 
92
 
93
  ### Configuration
94
 
95
+ | HyperParameter | SEA-LION-v1-7B |
96
  |-------------------|:------------------:|
97
  | Precision | bfloat16 |
98
  | Optimizer | decoupled_adamw |
 
106
 
107
  ### Model Architecture and Objective
108
 
109
+ SEA-LION-v1-7B is a decoder model using the MPT architecture.
110
 
111
+ | Parameter | SEA-LION-v1-7B |
112
  |-----------------|:-----------:|
113
  | Layers | 32 |
114
  | d_model | 4096 |