---
datasets:
- BramVanroy/ultra_feedback_dutch_cleaned
language:
- nl
base_model:
- robinsmits/Schaapje-2B-Chat-SFT-V1.0
pipeline_tag: text-generation
library_name: transformers
tags:
- granite
- granite 3.0
- schaapje
- trl
- sft
- dpo
inference: false
license: apache-2.0
---
# Schaapje-2B-Chat-V1.0
## Model description
This is the DPO aligned model based on the SFT trained model [Schaapje-2B-Chat-SFT-V1.0](https://huggingface.co/robinsmits/Schaapje-2B-Chat-SFT-V1.0).
General Dutch Chat and/or Instruction following works quitte well with this model.
## Model usage
A basic example of how to use this DPO aligned model for Chat or Instruction following.
```
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
device = 'cuda'
model_name = 'robinsmits/Schaapje-2B-Chat-V1.0'
model = AutoModelForCausalLM.from_pretrained(model_name,
device_map = "auto",
torch_dtype = torch.bfloat16)
tokenizer = AutoTokenizer.from_pretrained(model_name)
messages = [{"role": "user", "content": "Hoi hoe gaat het ermee?"}]
chat = tokenizer.apply_chat_template(messages,
tokenize = False,
add_generation_prompt = True)
input_tokens = tokenizer(chat, return_tensors = "pt").to('cuda')
output = model.generate(**input_tokens,
max_new_tokens = 512,
do_sample = True)
output = tokenizer.decode(output[0], skip_special_tokens = False)
print(output)
```
## Intended uses & limitations
As with all LLM's this model can also experience bias and hallucinations. Regardless of how you use this model always perform the necessary testing and validation.
## Datasets and Licenses
The following dataset was used for DPO alignment:
- [BramVanroy/ultra_feedback_dutch_cleaned](https://huggingface.co/datasets/BramVanroy/ultra_feedback_dutch_cleaned): apache-2.0
## Model Training
The notebook used to train this DPO aligned model is available at the following link: [Schaapje-2B-Chat-DPO-V1.0](https://github.com/RobinSmits/Schaapje/blob/main/Schaapje-2B-Chat-DPO-V1.0.ipynb)