Qwen-GRPO-geological-training

GRPO-trained Qwen model specialized for geological questions and analysis

Model Details

  • Base Model: Qwen/Qwen2.5-0.5B-Instruct
  • Training Method: GRPO (Generalized Reward Preference Optimization)
  • Domain: Geology and Earth Sciences
  • Model Type: Causal Language Model
  • Architecture: Transformer-based

Training Details

This model was trained using GRPO (Generalized Reward Preference Optimization) on geological datasets. The training process included:

  • Reward Functions:
    • Geological accuracy reward
    • Format compliance reward
    • Reasoning steps reward
  • System Prompt: Specialized geological expert system prompt
  • Response Format: Structured thinking process with solution tags

Intended Use

This model is designed for:

  • Answering geological questions
  • Providing educational content about earth sciences
  • Assisting with mineral identification
  • Explaining geological processes
  • Rock and mineral analysis

Usage Example

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Load the model
model_name = "joe-xhedi/Qwen-GRPO-geological-training"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.bfloat16)

# Geological system prompt
system_prompt = '''You are a geological expert assistant. When answering geological questions, follow this format: 
First, analyze the problem step by step in your thinking process within <|begin_of_thought|> and <|end_of_thought|> tags. 
Then provide your solution within <|begin_of_solution|> and <|end_of_solution|> tags. 
Your thinking process should include geological principles, data analysis, and reasoning. 
Your solution should be clear, accurate, and based on geological expertise.'''

# Example usage
messages = [
    {"role": "system", "content": system_prompt},
    {"role": "user", "content": "What type of rock is formed by cooling magma?"}
]

# Generate response
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=300, do_sample=True, temperature=0.7)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

Response Format

The model is trained to respond in a structured format:

<|begin_of_thought|>
[Step-by-step geological reasoning and analysis]
<|end_of_thought|>

<|begin_of_solution|>
[Clear, accurate geological solution or explanation]
<|end_of_solution|>

Limitations

  • Specialized for geological topics
  • May not perform well on general conversational tasks
  • Responses are structured and may seem formal
  • Based on training data available up to the training cutoff

Training Data

The model was trained on geological datasets including:

  • Mineral identification questions
  • Rock formation processes
  • Geological principles and concepts
  • Earth science educational content

Ethical Considerations

  • This model is designed for educational and research purposes
  • Users should verify geological information for professional applications
  • The model may have biases present in the training data

Citation

If you use this model in your research, please cite:

@model{qwen-geological-expert,
  author = {joe-xhedi},
  title = {GRPO-trained Qwen Model for Geological Analysis},
  year = {2025},
  url = {https://huggingface.co/joe-xhedi/Qwen-GRPO-geological-training}
}

Model Card Contact

For questions about this model, please contact the model author through Hugging Face.

Downloads last month
14
Safetensors
Model size
494M params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for joe-xhedi/Qwen-GRPO-geological-training

Base model

Qwen/Qwen2.5-0.5B
Finetuned
(454)
this model