TinyLlama 1.1B Chat - Roast Fine-tuned

This model is a fine-tuned version of TinyLlama/TinyLlama-1.1B-Chat-v1.0, specifically adapted to generate sarcastic and "roasting" responses based on a community-curated dataset.

It was trained using the TRL (Transformer Reinforcement Learning) library, which simplifies Supervised Fine-Tuning (SFT) workflows.

Model Description

This model is an instruction-following language model based on the TinyLlama architecture, further fine-tuned on a dataset of user prompts and corresponding roasting responses. The goal of fine-tuning was to specialize the model's conversational style towards delivering humorous (often insulting) roasts.

Intended Use

This model is intended for research and experimentation purposes only. Given the nature of the training data, which contains potentially offensive and negative language, this model should be used with extreme caution.

Acceptable uses: Research into model behavior on specialized datasets, exploration of fine-tuning techniques, novelty/entertainment applications with explicit disclaimers and robust safety filters.
Unacceptable uses: Any application that could cause harm, including generating offensive content without user awareness, deploying in sensitive environments, or using where outputs are not reviewed and filtered.

Users should be aware that this model will likely generate sarcastic, rude, or insulting content.

Limitations and Bias

The model inherits potential biases from its base model and, significantly, from the fine-tuning dataset (kaifkhaan/roast).

Content: It is specifically trained to generate content that is critical, negative, and potentially offensive. It will not provide helpful or polite responses in most cases.
Bias: The biases present in the kaifkhaan/roast dataset will be reflected in the model's outputs. This could include biases related to various demographics, stereotypes, or sensitive topics present in the original data. Rigorous testing and filtering are required for any application.
Factual Accuracy: Fine-tuning on this dataset does not improve the model's factual knowledge or general reasoning abilities. Its responses are based on the patterns learned for roasting.

Quick start

To use this model, you can load it using the transformers library. Ensure you have transformers, torch, peft, bitsandbytes (if using 4-bit loading), and trl installed.

pip install transformers torch peft bitsandbytes trl

Here's how to use it with a pipeline:

from transformers import pipeline
import torch

# Specify the repository ID
model_repo_id = "AnnasShaikh/TinyLlama-1.1B-Chat-Roast" # Make sure this matches your actual repo ID

# Load the pipeline (adjust device if needed, e.g., device="cpu")
# You might need to load the model with quantization config if it was saved that way
# or use AutoModelForCausalLM.from_pretrained directly with PEFT
try:
    generator = pipeline(
        "text-generation",
        model=model_repo_id,
        # You might need to specify quantization config here if the model was saved with it
        # device="cuda" if torch.cuda.is_available() else "cpu",
        torch_dtype=torch.float16 # Recommended for inference if supported by hardware
    )
    print(f"Pipeline loaded for model {model_repo_id}")

    # Example chat interaction
    prompt = "Hello, tell me something about yourself." # Try a generic prompt first
    prompt_roast = "You can't roast me!" # Try a roasting-specific prompt

    # Format the prompt in the chat format used during training
    # This format must match the one in your format_chat_prompt function
    chat_prompt_formatted = f"<|user|>\n{prompt_roast}\n<|assistant|>\n"

    print(f"\nInput formatted prompt:\n{chat_prompt_formatted}")

    # Generate output
    # Adjust generation parameters for desired creativity/determinism
    output = generator(
        chat_prompt_formatted,
        max_new_tokens=100,
        num_beams=1, # Use 1 for greedy or sampling
        do_sample=True, # Set to True for sampling
        temperature=0.7,
        top_k=50,
        top_p=0.95,
        # stop_sequence=['<|user|>'] # Optional: Stop generation before the next user turn
    )

    generated_text = output[0]

    print("\n--- Generated Text ---")
    # The output includes the input prompt, you might want to trim it
    print(generated_text)
    print("----------------------")

except Exception as e:
    print(f"Error loading model or generating text: {e}")
    print("Please ensure you have the necessary libraries installed (transformers, torch, peft, bitsandbytes, trl)")
    print("And that the model repository ID is correct and the model is accessible.")

Note: The pipeline might need adjustments to correctly load the PEFT adapter and base model, especially with quantization. For more reliable loading of PEFT models with quantization, you might need to load the model using AutoPeftModelForCausalLM.from_pretrained and the tokenizer separately, as shown in your original inference code block, and then use the model's generate method directly instead of the pipeline.

This model was trained with SFT.

Framework versions

TRL: 0.19.0
Transformers: 4.52.4
Pytorch: 2.6.0+cu124
Datasets: 3.6.0
Tokenizers: 0.21.2

Citations

Cite TRL as:

@misc{vonwerra2022trl,
    title        = {{TRL: Transformer Reinforcement Learning}},
    author       = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallou{\'e}dec},
    year         = 2020,
    journal      = {GitHub repository},
    publisher    = {GitHub},
    howpublished = {\url{https://github.com/huggingface/trl}}

AnnasShaikh
/

TinyLlama-1.1B-Chat-Roast