File size: 2,187 Bytes
fafe39f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
convert this to raw readme.md file, it's a model card on huggingface

# Pashto BERT (BERT-Base)

## Model Overview
This is a monolingual **Pashto BERT (BERT-Base)** model trained on a large **Pashto corpus**. The model is designed to understand and generate text in **Pashto**, making it suitable for various downstream **Natural Language Processing (NLP) tasks**.

## Model Details
- **Architecture:** BERT-Base (12 layers, 768 hidden size, 12 attention heads, 110M parameters)
- **Language:** Pashto (ps)
- **Training Corpus:** A diverse set of Pashto text data, including news articles, books, and web content.
- **Special Tokens:** `[CLS]`, `[SEP]`, `[PAD]`, `[MASK]`, `[UNK]`

## Intended Use
This model can be **fine-tuned** for various Pashto-specific NLP tasks, such as:
- **Sequence Classification:** Sentiment analysis, topic classification, and document categorization.
- **Sequence Tagging:** Named entity recognition (NER) and part-of-speech (POS) tagging.
- **Text Generation & Understanding:** Question answering, text summarization, and machine translation.

## How to Use
This model can be loaded using the `transformers` library from Hugging Face:

```python
from transformers import AutoModel, AutoTokenizer

model_name = "your-huggingface-username/pashto-bert-base"
tokenizer = AutoTokenizer.from_pretrained("/kaggle/working/model/")
model = AutoModel.from_pretrained(model_name)

text = "ستاسو نننۍ ورځ څنګه وه؟"
tokens = tokenizer(text, return_tensors="pt")
out = model(**tokens)
```

## Training Details
- **Optimization:** AdamW
- **Sequence Length:** 128
- **Warmup Steps:** 10,000
- **Warmup Ratio:** 0.06
- **Learning Rate:** 1e-4
- **Weight Decay:** 0.01
- **Adam Optimizer Parameters:**
  - **Epsilon:** 1e-8
  - **Betas:** (0.9, 0.999)
- **Gradient Accumulation Steps:** 1
- **Max Gradient Norm:** 1.0
- **Scheduler:** `linear_schedule_with_warmup`


## Limitations & Biases
- The model may reflect biases present in the training data.
- Performance on **low-resource or domain-specific tasks** may require additional fine-tuning.
- It is not trained for **code-switching scenarios** (e.g., mixing Pashto with English or other languages).