TinyLlama 1.1B Chat - Roast Fine-tuned
This model is a fine-tuned version of TinyLlama/TinyLlama-1.1B-Chat-v1.0, specifically adapted to generate sarcastic and "roasting" responses based on a community-curated dataset.
It was trained using the TRL (Transformer Reinforcement Learning) library, which simplifies Supervised Fine-Tuning (SFT) workflows.
Model Description
This model is an instruction-following language model based on the TinyLlama architecture, further fine-tuned on a dataset of user prompts and corresponding roasting responses. The goal of fine-tuning was to specialize the model's conversational style towards delivering humorous (often insulting) roasts.
Intended Use
This model is intended for research and experimentation purposes only. Given the nature of the training data, which contains potentially offensive and negative language, this model should be used with extreme caution.
- Acceptable uses: Research into model behavior on specialized datasets, exploration of fine-tuning techniques, novelty/entertainment applications with explicit disclaimers and robust safety filters.
- Unacceptable uses: Any application that could cause harm, including generating offensive content without user awareness, deploying in sensitive environments, or using where outputs are not reviewed and filtered.
Users should be aware that this model will likely generate sarcastic, rude, or insulting content.
Limitations and Bias
The model inherits potential biases from its base model and, significantly, from the fine-tuning dataset (kaifkhaan/roast
).
- Content: It is specifically trained to generate content that is critical, negative, and potentially offensive. It will not provide helpful or polite responses in most cases.
- Bias: The biases present in the
kaifkhaan/roast
dataset will be reflected in the model's outputs. This could include biases related to various demographics, stereotypes, or sensitive topics present in the original data. Rigorous testing and filtering are required for any application. - Factual Accuracy: Fine-tuning on this dataset does not improve the model's factual knowledge or general reasoning abilities. Its responses are based on the patterns learned for roasting.
Quick start
To use this model, you can load it using the transformers
library. Ensure you have transformers
, torch
, peft
, bitsandbytes
(if using 4-bit loading), and trl
installed.
pip install transformers torch peft bitsandbytes trl
Here's how to use it with a pipeline
:
from transformers import pipeline
import torch
# Specify the repository ID
model_repo_id = "AnnasShaikh/TinyLlama-1.1B-Chat-Roast" # Make sure this matches your actual repo ID
# Load the pipeline (adjust device if needed, e.g., device="cpu")
# You might need to load the model with quantization config if it was saved that way
# or use AutoModelForCausalLM.from_pretrained directly with PEFT
try:
generator = pipeline(
"text-generation",
model=model_repo_id,
# You might need to specify quantization config here if the model was saved with it
# device="cuda" if torch.cuda.is_available() else "cpu",
torch_dtype=torch.float16 # Recommended for inference if supported by hardware
)
print(f"Pipeline loaded for model {model_repo_id}")
# Example chat interaction
prompt = "Hello, tell me something about yourself." # Try a generic prompt first
prompt_roast = "You can't roast me!" # Try a roasting-specific prompt
# Format the prompt in the chat format used during training
# This format must match the one in your format_chat_prompt function
chat_prompt_formatted = f"<|user|>\n{prompt_roast}\n<|assistant|>\n"
print(f"\nInput formatted prompt:\n{chat_prompt_formatted}")
# Generate output
# Adjust generation parameters for desired creativity/determinism
output = generator(
chat_prompt_formatted,
max_new_tokens=100,
num_beams=1, # Use 1 for greedy or sampling
do_sample=True, # Set to True for sampling
temperature=0.7,
top_k=50,
top_p=0.95,
# stop_sequence=['<|user|>'] # Optional: Stop generation before the next user turn
)
generated_text = output[0]
print("\n--- Generated Text ---")
# The output includes the input prompt, you might want to trim it
print(generated_text)
print("----------------------")
except Exception as e:
print(f"Error loading model or generating text: {e}")
print("Please ensure you have the necessary libraries installed (transformers, torch, peft, bitsandbytes, trl)")
print("And that the model repository ID is correct and the model is accessible.")
Note: The pipeline
might need adjustments to correctly load the PEFT adapter and base model, especially with quantization. For more reliable loading of PEFT models with quantization, you might need to load the model using AutoPeftModelForCausalLM.from_pretrained
and the tokenizer separately, as shown in your original inference code block, and then use the model's generate
method directly instead of the pipeline
.
This model was trained with SFT.
Framework versions
- TRL: 0.19.0
- Transformers: 4.52.4
- Pytorch: 2.6.0+cu124
- Datasets: 3.6.0
- Tokenizers: 0.21.2
Citations
Cite TRL as:
@misc{vonwerra2022trl,
title = {{TRL: Transformer Reinforcement Learning}},
author = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallou{\'e}dec},
year = 2020,
journal = {GitHub repository},
publisher = {GitHub},
howpublished = {\url{https://github.com/huggingface/trl}}
Model tree for AnnasShaikh/TinyLlama-1.1B-Chat-Roast
Base model
TinyLlama/TinyLlama-1.1B-Chat-v1.0