onebeans's picture
Update README.md
74060e3 verified
---
license: apache-2.0
datasets:
- beomi/KoAlpaca-RealQA
language:
- ko
base_model:
- Qwen/Qwen2.5-Coder-1.5B-Instruct
pipeline_tag: text-generation
---
# Model Description
Qwen/Qwen2.5-Coder-1.5B-Instruct์„ ๊ธฐ๋ฐ˜์œผ๋กœ PEFT๋ฅผ ์ด์šฉํ•˜์—ฌ QLoRA (4-bit quantization + PEFT)ํ•ด๋ณธ ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค.
ํ•™์Šต ๋ฐ์ดํ„ฐ๋Š” beomi/KoAlpaca-RealQA๋ฅผ ์‚ฌ์šฉํ•˜์˜€์Šต๋‹ˆ๋‹ค.
์ž‘์€ ๋ชจ๋ธ์„ ์ด์šฉํ•˜์—ฌ QLoRA๋ฅผ ํ•œ ๊ฒƒ์ด๋‹ค ๋ณด๋‹ˆ ์–‘์งˆ์˜ output์ด ๋‚˜์˜ค์ง€๋Š” ์•Š์ง€๋งŒ QLoRA๋ชจ๋ธ๊ณผ ์›๋ณธ๋ชจ๋ธ์˜ ๋‹ต๋ณ€์ด ์ฐจ์ด๋Š” ํ™•์‹คํžˆ ์žˆ์—ˆ์Šต๋‹ˆ๋‹ค.
# Quantization Configuration
```python
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_use_double_quant=True,
bnb_4bit_compute_dtype=torch.float16,
)
```
# LoRA Condifiguration
```python
lora_config = LoraConfig(
r=8,
lora_alpha=32,
lora_dropout=0.05,
bias="none",
task_type="CAUSAL_LM",
target_modules=["c_attn", "q_proj", "v_proj"]
)
```
# Training Arguments
```python
training_args = TrainingArguments(
num_train_epochs=8,
per_device_train_batch_size=4,
gradient_accumulation_steps=4,
evaluation_strategy="steps",
eval_steps=300,
save_strategy="steps",
save_steps=300,
logging_steps=300,
load_best_model_at_end=True,
metric_for_best_model="eval_loss",
greater_is_better=False
)
```
# Training Progress
| Step | Training Loss | Validation Loss |
|------|---------------|-----------------|
| 300 | 1.595000 | 1.611501 |
| 600 | 1.593300 | 1.596210 |
| 900 | 1.577600 | 1.586121 |
| 1200 | 1.564600 | 1.577804 |
| ... | ... | ... |
| 7200 | 1.499700 | 1.525933 |
| 7500 | 1.493400 | 1.525612 |
| 7800 | 1.491000 | 1.525330 |
| 8100 | 1.499900 | 1.525138 |
# ์‹คํ–‰ ์ฝ”๋“œ
```python
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
import torch
# Quantization config (must match QLoRA settings used during fine-tuning)
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_use_double_quant=True,
bnb_4bit_compute_dtype=torch.float16,
)
# Load tokenizer and model (local or hub path)
model_path = "onebeans/Qwen2.5-Coder-KoInstruct-QLoRA"
tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForCausalLM.from_pretrained(
model_path,
quantization_config=bnb_config,
device_map="auto"
)
model.eval()
# Define prompt using ChatML format (Qwen-style)
def build_chatml_prompt(question: str) -> str:
system_msg = "<|im_start|>system\n๋‹น์‹ ์€ ์œ ์šฉํ•œ ํ•œ๊ตญ์–ด ๋„์šฐ๋ฏธ์ž…๋‹ˆ๋‹ค.<|im_end|>\n"
user_msg = f"<|im_start|>user\n{question}<|im_end|>\n"
return system_msg + user_msg + "<|im_start|>assistant\n"
# Run inference
def generate_response(question: str, max_new_tokens: int = 128) -> str:
prompt = build_chatml_prompt(question)
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
with torch.no_grad():
outputs = model.generate(
**inputs,
max_new_tokens=max_new_tokens,
do_sample=False,
top_p=0.9,
temperature=0.7,
eos_token_id=tokenizer.eos_token_id,
)
return tokenizer.decode(outputs[0], skip_special_tokens=True)
# Example
question = "ํ•œ๊ตญ์˜ ์ˆ˜๋„๋Š” ์–ด๋””์ธ๊ฐ€์š”?" # ๊ธฐ์กด ๋ชจ๋ธ(Qwen/Qwen2.5-Coder-1.5B-Instruct)์˜ ์‘๋‹ต -> ํ•œ๊ตญ์˜ ์ˆ˜๋„๋Š” ์„œ์šธ์ž…๋‹ˆ๋‹ค.
response = generate_response(question)
print("๋ชจ๋ธ ์‘๋‹ต:\n", response)
```
# ์‹คํ–‰ํ™˜๊ฒฝ
Window 10
NVIDIA GeForce RTX 4070 Ti
# Framework Versions
Python: 3.10.14
PyTorch: 1.12.1
Transformers: 4.46.2
Datasets: 3.2.0
Tokenizers: 0.20.3
PEFT: 0.8.2