aisingapore
/

SEA-LION-v1-7B

@@ -1,5 +1,5 @@
 ---
-new_version: aisingapore/gemma2-9b-cpt-sea-lionv3-base
 license: mit
 language:
 - en
@@ -14,7 +14,7 @@ language:
 - km
 - lo
 ---
-# SEA-LION
 SEA-LION is a collection of Large Language Models (LLMs) which has been pretrained and instruct-tuned for the Southeast Asia (SEA) region.
 The size of the models range from 3 billion to 7 billion parameters.
@@ -30,11 +30,11 @@ SEA-LION stands for <i>Southeast Asian Languages In One Network</i>.
 The SEA-LION model is a significant leap forward in the field of Natural Language Processing,
 specifically trained to understand the SEA regional context.
-SEA-LION is built on the robust MPT architecture and has a vocabulary size of 256K.
 For tokenization, the model employs our custom SEABPETokenizer, which is specially tailored for SEA languages, ensuring optimal model performance.
-The training data for SEA-LION encompasses 980B tokens.
 - **Developed by:** Products Pillar, AI Singapore
 - **Funded by:** Singapore NRF
@@ -44,7 +44,7 @@ The training data for SEA-LION encompasses 980B tokens.
 ### Performance Benchmarks
-SEA-LION has an average performance on general tasks in English (as measured by Hugging Face's LLM Leaderboard):
 | Model       | ARC   | HellaSwag | MMLU  | TruthfulQA | Average |
 |-------------|:-----:|:---------:|:-----:|:----------:|:-------:|
@@ -54,7 +54,7 @@ SEA-LION has an average performance on general tasks in English (as measured by
 ### Data
-SEA-LION was trained on 980B tokens of the following data:
 | Data Source               | Unique Tokens | Multiplier | Total Tokens | Percentage |
 |---------------------------|:-------------:|:----------:|:------------:|:----------:|
@@ -80,10 +80,10 @@ SEA-LION was trained on 980B tokens of the following data:
 ### Infrastructure
-SEA-LION was trained using [MosaicML Composer](https://github.com/mosaicml/composer)
 on the following hardware:
-| Training Details     | SEA-LION 7B  |
 |----------------------|:------------:|
 | AWS EC2 p4d.24xlarge | 32 instances |
 | Nvidia A100 40GB GPU | 256          |
@@ -92,7 +92,7 @@ on the following hardware:
 ### Configuration
-| HyperParameter    | SEA-LION 7B        |
 |-------------------|:------------------:|
 | Precision         | bfloat16           |
 | Optimizer         | decoupled_adamw    |
@@ -106,9 +106,9 @@ on the following hardware:
 ### Model Architecture and Objective
-SEA-LION is a decoder model using the MPT architecture.
-| Parameter       | SEA-LION 7B |
 |-----------------|:-----------:|
 | Layers          | 32          |
 | d_model         | 4096        |

 ---
+new_version: aisingapore/Gemma-SEA-LION-v3-9B
 license: mit
 language:
 - en
 - km
 - lo
 ---
+# SEA-LION-v1-7B
 SEA-LION is a collection of Large Language Models (LLMs) which has been pretrained and instruct-tuned for the Southeast Asia (SEA) region.
 The size of the models range from 3 billion to 7 billion parameters.
 The SEA-LION model is a significant leap forward in the field of Natural Language Processing,
 specifically trained to understand the SEA regional context.
+SEA-LION-v1-7B is built on the robust MPT architecture and has a vocabulary size of 256K.
 For tokenization, the model employs our custom SEABPETokenizer, which is specially tailored for SEA languages, ensuring optimal model performance.
+The training data for SEA-LION-v1-7B encompasses 980B tokens.
 - **Developed by:** Products Pillar, AI Singapore
 - **Funded by:** Singapore NRF
 ### Performance Benchmarks
+SEA-LION-v1-7B has an average performance on general tasks in English (as measured by Hugging Face's LLM Leaderboard):
 | Model       | ARC   | HellaSwag | MMLU  | TruthfulQA | Average |
 |-------------|:-----:|:---------:|:-----:|:----------:|:-------:|
 ### Data
+SEA-LION-v1-7B was trained on 980B tokens of the following data:
 | Data Source               | Unique Tokens | Multiplier | Total Tokens | Percentage |
 |---------------------------|:-------------:|:----------:|:------------:|:----------:|
 ### Infrastructure
+SEA-LION-v1-7B was trained using [MosaicML Composer](https://github.com/mosaicml/composer)
 on the following hardware:
+| Training Details     | SEA-LION-v1-7B  |
 |----------------------|:------------:|
 | AWS EC2 p4d.24xlarge | 32 instances |
 | Nvidia A100 40GB GPU | 256          |
 ### Configuration
+| HyperParameter    | SEA-LION-v1-7B        |
 |-------------------|:------------------:|
 | Precision         | bfloat16           |
 | Optimizer         | decoupled_adamw    |
 ### Model Architecture and Objective
+SEA-LION-v1-7B is a decoder model using the MPT architecture.
+| Parameter       | SEA-LION-v1-7B |
 |-----------------|:-----------:|
 | Layers          | 32          |
 | d_model         | 4096        |