README.md · nabilfaieaz/tinyllama-med-full at main

tinyllama-med-full / README.md

nabilfaieaz

Create README.md

f9e026b verified 20 days ago

preview code

raw

history blame contribute delete

3.58 kB

	---
	datasets:
	- ofir408/MedConceptsQA
	language:
	- en
	metrics:
	- accuracy
	base_model:
	- TinyLlama/TinyLlama-1.1B-Chat-v1.0
	pipeline_tag: question-answering
	library_name: transformers
	tags:
	- tinyllama
	- lora
	- instruction-tuned
	- peft
	- Lora
	- merged
	- medical
	- healthcare
	---

	# 🩺 TinyLlama Medical Assistant (Merged LoRA)

	Author: Nabil Faieaz
	Base model: [TinyLlama/TinyLlama-1.1B-Chat-v1.0](https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0)
	Fine-tuning method: LoRA (Low-Rank Adaptation) using PEFT → merged into base weights
	Intended use: Concise, factual, general medical information

	---

	## 📌 Overview

	This model is a fine-tuned version of TinyLlama 1.1B-Chat adapted for medical question answering.
	It has been trained to give brief and accurate answers to medical-related queries, following a consistent Q/A style.

	Key features:
	- ✅ LoRA fine-tuning for efficient adaptation on limited compute (T4 GPU)
	- ✅ Merged LoRA + base into a single standalone model (no separate adapter needed)
	- ✅ Optimized for short, factual answers — avoids overly verbose outputs
	- ✅ Context-aware: warns users to seek professional medical help for urgent/personal issues

	---

	## ⚠️ Disclaimer

	> This model is for educational and informational purposes only.
	> It is not a substitute for professional medical advice, diagnosis, or treatment.
	> Always consult a qualified healthcare provider for medical concerns.

	---

	## 🚀 Quick Start

	```python
	from transformers import AutoTokenizer, AutoModelForCausalLM

	model_id = "nabilfaieaz/tinyllama-med-full"

	# Load tokenizer and model
	tokenizer = AutoTokenizer.from_pretrained(model_id)
	if tokenizer.pad_token is None:
	tokenizer.pad_token = tokenizer.eos_token

	model = AutoModelForCausalLM.from_pretrained(
	model_id,
	torch_dtype="auto",
	device_map="auto"
	)

	# Example prompt
	system_prompt = (
	"You are a helpful, concise medical assistant. Provide general information only, "
	"not a diagnosis. If urgent or personal issues are mentioned, advise seeing a clinician."
	)

	question = "What is hypertension?"
	prompt = f"{system_prompt}\n\nQuestion: {question}\nAnswer:"

	inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
	outputs = model.generate(
	**inputs,
	max_new_tokens=128,
	do_sample=False,
	temperature=0.0,
	top_p=1.0,
	eos_token_id=tokenizer.eos_token_id,
	pad_token_id=tokenizer.pad_token_id
	)

	print(tokenizer.decode(outputs[0], skip_special_tokens=True))

	🧠 Training Details
	Base model: TinyLlama/TinyLlama-1.1B-Chat-v1.0
	Fine-tuning method: LoRA (via peft)
	Target modules: q_proj, k_proj, v_proj, o_proj
	LoRA config:
	* r = 16
	* alpha = 16
	* dropout = 0.0
	Max sequence length: 512 tokens
	Batch size: 2 per device (gradient accumulation for effective batch)
	Learning rate: 2e-4
	Precision: fp16
	Evaluation: periodic eval every 200 steps
	Checkpoints: saved every 500 steps, final merge from checkpoint-17000

	📊 Intended Use
	Intended:
	* Educational explanations of medical terms and concepts
	* Study aid for medical students and healthcare professionals
	* Healthcare-related chatbot demos

	Not intended:
	* Real-time clinical decision making
	* Emergency medical guidance
	* Handling sensitive personal medical data (PHI)
	⚙️ Technical Notes
	* The model is merged — you don’t need to separately load LoRA adapters.
	* Works with Hugging Face transformers ≥ 4.38.
	* Can be quantized to 4-bit (e.g., QLoRA) for local inference.