--- library_name: transformers datasets: - SajjadAyoubi/persian_qa language: - fa base_model: pedramyazdipoor/parsbert_Youtubeing_PQuAD metrics: - f1 - exact_match pipeline_tag: question-answering --- # Model Card for AmoooEBI/Bert-fa-qa-finetuned-on-PersianQA This model is a version of ParsBERT, fine-tuned for extractive question answering on the Persian language using the PersianQA dataset. ## Model Details ### Model Description This is a ParsBERT model fine-tuned on the `SajjadAyoubi/persian_qa` dataset. It is designed for extractive question answering, meaning it extracts the answer to a question directly from a given context. The fine-tuning process has significantly improved its ability to understand and respond to questions in Persian compared to the base model. - **Developed by:** Amir Mohammad Ebrahiminasab - **Shared by:** Amir Mohammad Ebrahiminasab - **Model type:** bert - **Language(s) (NLP):** fa (Persian) - **License:** MIT - **Finetuned from model:** `pedramyazdipoor/parsbert_Youtubeing_PQuAD` ### Model Sources - **Repository:** [https://huggingface.co/AmoooEBI/Bert-fa-qa-finetuned-on-PersianQA](https://huggingface.co/AmoooEBI/Bert-fa-qa-finetuned-on-PersianQA) - **Demo:** [https://huggingface.co/spaces/AmoooEBI/ParsBert-QA-Chatbot](https://huggingface.co/spaces/AmoooEBI/ParsBert-QA-Chatbot) ## Uses ### Direct Use The model can be used for extractive question answering in Persian. You can provide a context and a question, and the model will extract the answer span from the context. ```python from transformers import pipeline qa_pipeline = pipeline( "question-answering", model="AmoooEBI/Bert-fa-qa-finetuned-on-PersianQA", tokenizer="AmoooEBI/Bert-fa-qa-finetuned-on-PersianQA" ) context = "فرهاد مجیدی قادیکلایی مشهور به فرهاد مجیدی بازیکن فوتبال اهل ایران است. او همچنین سابقه بازی در باشگاه استقلال را در کارنامه دارد." question = "فرهاد مجیدی در چه تیمی سابقه بازی دارد؟" result = qa_pipeline(question=question, context=context) # {'score': 0.99..., 'start': 101, 'end': 108, 'answer': 'استقلال'} print(f"Answer: '{result['answer']}'") ```` ## Bias, Risks, and Limitations The model's performance is directly influenced by the content of the PersianQA dataset. It may not perform as well on contexts from different domains or with different linguistic styles. The model shows a performance drop for answers that are longer than the dataset's average, indicating a potential bias towards extracting shorter text spans. ### Recommendations Users should be aware of the model's limitations, especially its reduced accuracy on longer answer spans. For critical applications, the model's outputs should be verified. ## How to Get Started with the Model Use the code below to get started with the model using PyTorch. ```python from transformers import AutoTokenizer, AutoModelForQuestionAnswering import torch tokenizer = AutoTokenizer.from_pretrained("AmoooEBI/Bert-fa-qa-finetuned-on-PersianQA") model = AutoModelForQuestionAnswering.from_pretrained("AmoooEBI/Bert-fa-qa-finetuned-on-PersianQA") context = "پایتخت اسپانیا شهر مادرید است." question = "پایتخت اسپانیا کجاست؟" inputs = tokenizer(question, context, return_tensors="pt") with torch.no_grad(): outputs = model(**inputs) answer_start_index = outputs.start_logits.argmax() answer_end_index = outputs.end_logits.argmax() predict_answer_tokens = inputs.input_ids[0, answer_start_index : answer_end_index + 1] answer = tokenizer.decode(predict_answer_tokens) print(f"Question: {question}") print(f"Answer: {answer}") # Answer: مادرید ``` ## Training Details ### Training Data The model was fine-tuned on the `SajjadAyoubi/persian_qa` dataset, which contains question-context-answer triplets in Persian. ### Training Procedure #### Preprocessing The training data was preprocessed by tokenizing question and context pairs. Long contexts were handled by creating multiple features for a single example using a sliding window approach (`doc_stride`). The start and end token positions for the answer were identified in the tokenized input. #### Training Hyperparameters The model was trained with the following hyperparameters: | Argument | Value | |---|---| | Learning Rate | $2 \times 10^{-5}$ | | Training Epochs | 10 | | Train Batch Size | 8 | | Evaluation Batch Size | 8 | | Weight Decay | 0.01 | | Scheduler Type | Cosine | | Warmup Ratio | 0.1 | | Best Model Metric | F1-Score | #### Speeds, Sizes, Times - The full fine-tuning process took approximately 1 hour and 22 minutes on a single GPU. ## Evaluation The model was evaluated on the validation split of the `SajjadAyoubi/persian_qa` dataset. ### Testing Data, Factors & Metrics #### Testing Data The evaluation was performed on the validation set of the `SajjadAyoubi/persian_qa` dataset. #### Factors The model's performance was analyzed based on two factors: - **Answer Presence**: Performance was measured separately for questions that have an answer in the context versus those that do not. - **Answer Length**: Performance was analyzed for answers shorter than the validation set average (22.78 characters) and those longer than the average. #### Metrics - **F1-Score**: The primary metric, measuring the harmonic mean of precision and recall on token overlap. - **Exact Match (EM)**: The percentage of predictions that perfectly match the ground truth answer. ### Results #### Summary **Overall Performance on the Validation Set** | Model Status | Exact Match | F1-Score | |---|---|---| | Fine-Tuned Model (10 Epochs) | 55.59% | 71.97% | **Performance on Data Subsets** | Case Type | Exact Match | F1-Score | |---|---|---| | Has Answer | 44.70% | 68.22% | | No Answer | 78.14% | 78.14% | | Answer Length | Exact Match | F1-Score | |---|---|---| | Longer than Avg. | 38.56% | 69.80% | | Shorter than Avg. | 53.01% | 68.88% | ## Environmental Impact Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute). - **Hardware Type:** T4 GPU - **Hours used:** 1.37 - **Cloud Provider:** Google Colab - **Carbon Emitted:** [Not Calculated] ## Technical Specifications ### Model Architecture and Objective The model is a BERT-base architecture with a linear layer on top of the hidden-states output for extractive question answering. The objective was to minimize the cross-entropy loss for the start and end token positions of the answer. ### Compute Infrastructure #### Hardware The model was trained on a single NVIDIA T4 GPU. #### Software - `transformers` - `torch` - `datasets` - `evaluate` ## Model Card Authors Amir Mohammad Ebrahiminasab ## Model Card Contact ebrahiminasab82@gmail.com