File size: 2,897 Bytes
b15711e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4d855ad
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
---
language:
- lg
- en
library_name: unsloth
pipeline_tag: text-generation
license: llama2
base_model: unsloth/gemma-2-2b-it
tags:
- luganda
- gemma
- pretrained
- wikipedia
- unsloth
datasets:
- wikimedia/wikipedia
---
# Gemma-2-2b-it Pretrained for Luganda

## Model Description
This is a continued pretraining of the Gemma-2-2b-it model on Luganda text data. The model has been pretrained on Wikipedia Luganda articles to adapt it for Luganda language understanding and generation.

## Model Details
- **Base Model**: unsloth/gemma-2-2b-it
- **Pretraining Data**: 
  - Luganda Wikipedia articles (wikimedia/wikipedia 20231101.lg)
- **Training Method**: LoRA with unsloth optimization
- **Context Length**: 2048 tokens
- **Training Hardware**: Tesla T4 GPU

## Training Process
The model was trained using the following configuration:

### LoRA Configuration
- LoRA rank (r): 128
- Target modules: 
  - q_proj, k_proj, v_proj, o_proj
  - gate_proj, up_proj, down_proj
  - embed_tokens, lm_head
- LoRA alpha: 32
- LoRA dropout: 0
- Used RS-LoRA (Rank Stabilized LoRA)

### Training Parameters
- Batch size: 2 with gradient accumulation steps of 8
- Learning rates:
  - General: 5e-5
  - Embeddings: 1e-6 (reduced for stability)
- Training epochs: 10
- Warmup steps: 10
- Warmup ratio: 0.1
- Weight decay: 0.01
- Optimizer: AdamW 8-bit
- LR scheduler: Linear

### Data Processing
The training data was processed using the following template:

```python
Ekyawandiikibwa kya Wikipedia
### Omutwe: {title}

### Akawayiro:
{text}
```

## Checkpoints
This repository contains multiple checkpoints from the pretraining process:
- checkpoint-500
- checkpoint-1000
- checkpoint-1500
- checkpoint-2000
- checkpoint-2500
- checkpoint-2530 (final)

## Usage

```python
from unsloth import FastLanguageModel
import torch

# Load the model
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "Bronsn/gemma-2-2b-it-pretrained",
    max_seq_length = 2048,
    dtype = None,  # Auto-detect
    load_in_4bit = True,
)

# Example usage
text = "Ekyawandiikibwa kya Wikipedia\n### Omutwe: Uganda\n\n### Akawayiro:\n"
inputs = tokenizer(text, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=100)
generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
```

## Limitations
- The model is specifically adapted for Luganda text understanding and generation
- Performance may vary on dialectal variations or code-mixed text
- The model maintains the base Gemma-2-2b-it limitations

## Citation
If you use this model, please cite:
```
@misc{luganda-gemma-pretrained,
  author = {Bronsn},
  title = {Gemma-2-2b-it Pretrained for Luganda},
  year = {2025},
  publisher = {HuggingFace}
}
```

## License
This model inherits the licensing terms from the base Gemma-2-2b-it model. For more details, please refer to [Gemma's license](https://ai.google.dev/gemma/terms).