Note: this is not a chat model, the chat model is coming soon but this is the base model for further fine-tuning. Thumbnail

print("Before we start")

We are not related to Roblox in any way, any mention of Roblox is purely to help people understand what the model is about. As per the Roblox website, they use Meta's Llama 3 (we assume 70B) for their AI assistant. This model, while powerful, cannot come close to the performance of a 70B model.

print("Stages of pre-training")

This model was continually pre-trained in 3 stages. (Note, allenai states that olmo 2 1B, which is the model this is based on was pre-trained on 4 trillion or so tokens.)

  • Stage 1: Pre-training on the Pinkstack/roblox-luau-corpus-text & Roblox/luau_corpus on 4096 context (the maximum olmo 2 can usually reach)

  • Stage 2: Pre-training on the boatbomber/roblox-info-dump with rope scaling set to 4, so stage 2 was for expanding the context of the model to 16384.

!stage 3 and onwards were with added layers. the model started with 16 layers, then we merged another 20 to make the model bigger and deeper!

  • Stage 3: Training on a mix of Pinkstack/roblox-luau-corpus-text & Roblox/luau_corpus + wikimedia/wikipedia with rope scaling set to 8, aka 32768 tokens of context. We mixed the wikimedia/wikipedia to hopefully improve the general text and knowledge of the model.

In total, the model was continually pre-trained on up to 1.3B tokens, final loss of 1.916400.

print("Use cases")

As this is a base model, there isn't much to do with it currently. But, you can fine-tune it on your own datasets to turn it into an instruct - chat type model.

print("Notice")

This stage-3 base model did not undergo saftey alignment by us, thus it can generate unethical content. Any outputs generated by the LLM are your responsibility.

print("Additional information")

This repo contains the stage 3 pre-trained/base model.

unsloth was used for training (https://unsloth.ai/)

Downloads last month
28
Safetensors
Model size
2.83B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 1 Ask for provider support

Model tree for Pinkstack/Luau-coder-v2-3B-base-32k

Finetuned
(3)
this model
Quantizations
2 models

Datasets used to train Pinkstack/Luau-coder-v2-3B-base-32k