tiny_shakespeare_transformer

A small Transformer Decoder model trained from scratch on the Tiny Shakespeare dataset.

Training details

  • Dataset: Tiny Shakespeare
  • Epochs: 5
  • Learning Rate: 0.0003
  • Batch Size: 32
  • Block Size: 128
  • Optimizer: AdamW
  • Loss Function: CrossEntropyLoss
  • Dropout Rate: 0.1
  • Embedding Dimension: 256
  • Number of Layers: 6
  • Number of Attention Heads: 8

Usage

To use this model, simply load it using the following code:

from transformers import AutoModelForCausalLM, AutoTokenizer

# Load the model and tokenizer
model = AutoModelForCausalLM.from_pretrained("NataliaH/tiny_shakespeare_transformer")
tokenizer = AutoTokenizer.from_pretrained("NataliaH/tiny_shakespeare_transformer")

# Encode input text
inputs = tokenizer("Once upon a time", return_tensors="pt")
outputs = model.generate(**inputs)
print(tokenizer.decode(outputs[0]))

Model Architecture

This model is a Transformer Decoder-based architecture, optimized for text generation. It was trained on the Tiny Shakespeare dataset to generate Shakespeare-like text.

Training Process

  • Training was performed for 5 epochs.
  • The model uses AdamW optimizer with a learning rate of 0.0003.
  • Dropout rate during training was set to 0.1 to reduce overfitting.

License

This model is released under the MIT License.

Downloads last month
4
Safetensors
Model size
124M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support