---
datasets:
- BramVanroy/ultra_feedback_dutch_cleaned
language:
- nl
base_model:
- robinsmits/Schaapje-2B-Chat-SFT-V1.0
pipeline_tag: text-generation
library_name: transformers
tags:
- granite
- granite 3.0
- schaapje
- trl
- sft
- dpo
inference: false
license: apache-2.0
---

<p align="center">
  <img src="sheep.png" alt="Schaapje logo" width="750"/>
</p>

# Schaapje-2B-Chat-V1.0

## Model description

This is the DPO aligned model based on the SFT trained model [Schaapje-2B-Chat-SFT-V1.0](https://huggingface.co/robinsmits/Schaapje-2B-Chat-SFT-V1.0).

General Dutch Chat and/or Instruction following works quitte well with this model.

## Model usage

A basic example of how to use this DPO aligned model for Chat or Instruction following.

```
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

device = 'cuda'
model_name = 'robinsmits/Schaapje-2B-Chat-V1.0'

model = AutoModelForCausalLM.from_pretrained(model_name, 
                                             device_map = "auto", 
                                             torch_dtype = torch.bfloat16)

tokenizer = AutoTokenizer.from_pretrained(model_name)

messages = [{"role": "user", "content": "Hoi hoe gaat het ermee?"}]

chat = tokenizer.apply_chat_template(messages, 
                                     tokenize = False, 
                                     add_generation_prompt = True)

input_tokens = tokenizer(chat, return_tensors = "pt").to('cuda')

output = model.generate(**input_tokens, 
                        max_new_tokens = 512,
                        do_sample = True)

output = tokenizer.decode(output[0], skip_special_tokens = False)
print(output)
```

## Intended uses & limitations

As with all LLM's this model can also experience bias and hallucinations. Regardless of how you use this model always perform the necessary testing and validation.

## Datasets and Licenses

The following dataset was used for DPO alignment:
- [BramVanroy/ultra_feedback_dutch_cleaned](https://huggingface.co/datasets/BramVanroy/ultra_feedback_dutch_cleaned): apache-2.0

##  Model Training

The notebook used to train this DPO aligned model is available at the following link: [Schaapje-2B-Chat-DPO-V1.0](https://github.com/RobinSmits/Schaapje/blob/main/Schaapje-2B-Chat-DPO-V1.0.ipynb)