|
|
|
GPT-Sw3 |
|
Overview |
|
The GPT-Sw3 model was first proposed in |
|
Lessons Learned from GPT-SW3: Building the First Large-Scale Generative Language Model for Swedish |
|
by Ariel Ekgren, Amaru Cuba Gyllensten, Evangelia Gogoulou, Alice Heiman, Severine Verlinden, Joey Öhman, |
|
Fredrik Carlsson, Magnus Sahlgren. |
|
Since that first paper the authors have extended their work and trained new models on their new 1.2TB corpora named The Nordic Pile. |
|
GPT-Sw3 is a collection of large decoder-only pretrained transformer language models that were developed by AI Sweden |
|
in collaboration with RISE and the WASP WARA for Media and Language. GPT-Sw3 has been trained on a dataset containing |
|
320B tokens in Swedish, Norwegian, Danish, Icelandic, English, and programming code. The model was pretrained using a |
|
causal language modeling (CLM) objective utilizing the NeMo Megatron GPT implementation. |
|
This model was contributed by AI Sweden Models. |
|
Usage example |
|
thon |
|
|
|
from transformers import AutoTokenizer, AutoModelForCausalLM |
|
tokenizer = AutoTokenizer.from_pretrained("AI-Sweden-Models/gpt-sw3-356m") |
|
model = AutoModelForCausalLM.from_pretrained("AI-Sweden-Models/gpt-sw3-356m") |
|
input_ids = tokenizer("Träd är fina för att", return_tensors="pt")["input_ids"] |
|
generated_token_ids = model.generate(inputs=input_ids, max_new_tokens=10, do_sample=True)[0] |
|
print(tokenizer.decode(generated_token_ids)) |
|
Träd är fina för att de är färgstarka. Men ibland är det fint |
|
|
|
Resources |
|
|
|
Text classification task guide |
|
Token classification task guide |
|
Causal language modeling task guide |
|
|
|
The implementation uses the GPT2Model coupled with our GPTSw3Tokenizer. Refer to GPT2Model documentation |
|
for API reference and examples. |
|
Note that sentencepiece is required to use our tokenizer and can be installed with pip install transformers[sentencepiece] or pip install sentencepiece |
|
|
|
GPTSw3Tokenizer |
|
[[autodoc]] GPTSw3Tokenizer |
|
- save_vocabulary |