Mistral-7B-Instruct-v0.2 Fine-tuned with DPO on LIMA-derived Preferences
This repository contains a version of the unsloth/mistral-7b-instruct-v0.2
model fine-tuned using Direct Preference Optimization (DPO). The fine-tuning process utilized a preference dataset derived from the LIMA dataset and responses generated by the base Mistral-7B-Instruct-v0.2 model, which were then ranked using PairRM from the LLM-Blender toolkit.
The goal of this fine-tuning is to align the model further with human preferences for helpful and harmless responses.
Model Details
- Base Model:
unsloth/mistral-7b-instruct-v0.2
(a version ofmistralai/Mistral-7B-Instruct-v0.2
optimized with Unsloth) - Fine-tuning Method: Direct Preference Optimization (DPO)
- Preference Data Source:
- Instructions: Sampled from the LIMA dataset.
- Responses: Generated by
mistralai/Mistral-7B-Instruct-v0.2
. - Ranking: Performed by
llm-blender/PairRM
. - The preference dataset used for DPO is available at:
ssui-liu/lima-mistral7b-pairrm-preference
(or your specified dataset name if different).
- Training Framework: Unsloth for efficient LoRA-based DPO training.
Training Configuration
The DPO training was performed with the following key hyperparameters (refer to scripts/dpo_training.py
for full details):
- LoRA
r
: 64 - LoRA
alpha
: 64 - LoRA
dropout
: 0.0 - Target Modules:
q_proj
,k_proj
,v_proj
,o_proj
,gate_proj
,up_proj
,down_proj
- Batch Size (per device): 2
- Gradient Accumulation Steps: 4
- Learning Rate: 5e-6
- Number of Epochs: 3 (or specify if
max_steps
was used) - DPO Beta (
β
): 0.1 - DPO Loss Type: sigmoid (or specify if hinge, ipo, kto was used)
- Max Sequence Length: 2048
How to Use
This model is a PEFT (LoRA) adapter. To use it, you need to load the base model (unsloth/mistral-7b-instruct-v0.2
) and then apply these LoRA weights.
from unsloth import FastLanguageModel
model_name = "unsloth/mistral-7b-instruct-v0.2"
peft_model_id = "ssui-liu/mistral-7b-lima-dpo" # Or your specific model ID
max_seq_length = 2048 # Or your preferred max sequence length
# Load the base model
model, tokenizer = FastLanguageModel.from_pretrained(
model_name=model_name,
max_seq_length=max_seq_length,
dtype=None, # Autodetect
load_in_4bit=True, # Or False if not using 4bit
)
# Load the PEFT adapter
model = FastLanguageModel.get_peft_model(
model,
peft_model_id,
# Ensure LoRA parameters match those during training if not automatically inferred
# r = 64,
# lora_alpha = 64,
# target_modules = ["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"],
# etc.
)
FastLanguageModel.for_inference(model) # Prepare model for inference
# Example usage
prompt = "What are the best ways to learn a new programming language?"
messages = [
{"role": "user", "content": prompt}
]
input_ids = tokenizer.apply_chat_template(messages, tokenize = True, add_generation_prompt = True, return_tensors = "pt").to("cuda")
outputs = model.generate(input_ids = input_ids, max_new_tokens = 256, use_cache = True)
response = tokenizer.batch_decode(outputs)[0]
# Extract just the assistant's response
# (This might vary slightly based on the exact chat template and base model version)
response_parts = response.split("[/INST]")
if len(response_parts) > 1:
assistant_response = response_parts[1].strip()
else:
assistant_response = response # Fallback if pattern not found
print(assistant_response)
- Downloads last month
- 4
Model tree for ssui-liu/mistral-7b-lima-dpo
Base model
unsloth/mistral-7b-instruct-v0.2-bnb-4bit