Prikshit7766 commited on
Commit
4ce59a0
·
verified ·
1 Parent(s): 0758cd6

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +90 -79
README.md CHANGED
@@ -1,79 +1,90 @@
1
-
2
- # DistilBERT Fine-Tuned on IMDB for Masked Language Modeling (Accelerate)
3
-
4
- ## Model Description
5
-
6
- This model is a fine-tuned version of [**`distilbert-base-uncased`**](https://huggingface.co/distilbert/distilbert-base-uncased) for the masked language modeling (MLM) task. It has been trained on the IMDb dataset using the Hugging Face 🤗 Accelerate library.
7
-
8
- ---
9
-
10
- ## Model Training Details
11
-
12
- ### Training Dataset
13
-
14
- - **Dataset:** [IMDB dataset](https://huggingface.co/datasets/imdb) from Hugging Face.
15
- - **Dataset Splits:**
16
- - Train: 25,000 samples
17
- - Test: 25,000 samples
18
- - Unsupervised: 50,000 samples
19
- - **Training Strategy:**
20
- - Combined the train and unsupervised splits for training, resulting in 75,000 training examples.
21
- - Applied fixed random masking to the evaluation set to ensure consistent perplexity scores.
22
-
23
- ---
24
-
25
-
26
- ### Training Configuration
27
-
28
- The model was trained using the following parameters:
29
-
30
- - **Number of Training Epochs:** `10`
31
- - **Batch Size:** `64` (per device).
32
- - **Learning Rate:** `5e-5`
33
- - **Weight Decay:** `0.01`
34
- - **Evaluation Strategy:** After each epoch.
35
- - **Early Stopping:** Enabled (Patience = `3`).
36
- - **Metric for Best Model:** `eval_loss`
37
- - **Direction:** Lower `eval_loss` is better (`greater_is_better = False`).
38
- - **Learning Rate Scheduler:** Linear decay with no warmup steps.
39
- - **Mixed Precision Training:** Enabled (FP16).
40
-
41
- ---
42
-
43
- ## Model Results
44
-
45
- ### Best Epoch Performance
46
- - **Best Epoch:** `9`
47
- - **Loss:** `2.0173`
48
- - **Perplexity:** `7.5178`
49
-
50
- ### Early Stopping
51
- - The training ran for the full `10` epochs as the evaluation loss continued to improve.
52
-
53
- ---
54
-
55
- ## Model Usage
56
-
57
- This fine-tuned model can be used for masked language modeling tasks using the `fill-mask` pipeline from Hugging Face. Below is an example:
58
-
59
- ```python
60
- from transformers import pipeline
61
-
62
- mask_filler = pipeline("fill-mask", model="Prikshit7766/distilbert-finetuned-imdb-mlm-accelerate")
63
-
64
- text = "This is a great [MASK]."
65
- predictions = mask_filler(text)
66
-
67
- for pred in predictions:
68
- print(f">>> {pred['sequence']}")
69
- ```
70
-
71
- **Example Output:**
72
-
73
- ```text
74
- >>> This is a great movie.
75
- >>> This is a great film.
76
- >>> This is a great show.
77
- >>> This is a great story.
78
- >>> This is a great documentary.
79
- ```
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ datasets:
3
+ - stanfordnlp/imdb
4
+ language:
5
+ - en
6
+ metrics:
7
+ - perplexity
8
+ base_model:
9
+ - distilbert/distilbert-base-uncased
10
+ pipeline_tag: fill-mask
11
+ ---
12
+
13
+ # DistilBERT Fine-Tuned on IMDB for Masked Language Modeling (Accelerate)
14
+
15
+ ## Model Description
16
+
17
+ This model is a fine-tuned version of [**`distilbert-base-uncased`**](https://huggingface.co/distilbert/distilbert-base-uncased) for the masked language modeling (MLM) task. It has been trained on the IMDb dataset using the Hugging Face 🤗 Accelerate library.
18
+
19
+ ---
20
+
21
+ ## Model Training Details
22
+
23
+ ### Training Dataset
24
+
25
+ - **Dataset:** [IMDB dataset](https://huggingface.co/datasets/imdb) from Hugging Face.
26
+ - **Dataset Splits:**
27
+ - Train: 25,000 samples
28
+ - Test: 25,000 samples
29
+ - Unsupervised: 50,000 samples
30
+ - **Training Strategy:**
31
+ - Combined the train and unsupervised splits for training, resulting in 75,000 training examples.
32
+ - Applied fixed random masking to the evaluation set to ensure consistent perplexity scores.
33
+
34
+ ---
35
+
36
+
37
+ ### Training Configuration
38
+
39
+ The model was trained using the following parameters:
40
+
41
+ - **Number of Training Epochs:** `10`
42
+ - **Batch Size:** `64` (per device).
43
+ - **Learning Rate:** `5e-5`
44
+ - **Weight Decay:** `0.01`
45
+ - **Evaluation Strategy:** After each epoch.
46
+ - **Early Stopping:** Enabled (Patience = `3`).
47
+ - **Metric for Best Model:** `eval_loss`
48
+ - **Direction:** Lower `eval_loss` is better (`greater_is_better = False`).
49
+ - **Learning Rate Scheduler:** Linear decay with no warmup steps.
50
+ - **Mixed Precision Training:** Enabled (FP16).
51
+
52
+ ---
53
+
54
+ ## Model Results
55
+
56
+ ### Best Epoch Performance
57
+ - **Best Epoch:** `9`
58
+ - **Loss:** `2.0173`
59
+ - **Perplexity:** `7.5178`
60
+
61
+ ### Early Stopping
62
+ - The training ran for the full `10` epochs as the evaluation loss continued to improve.
63
+
64
+ ---
65
+
66
+ ## Model Usage
67
+
68
+ This fine-tuned model can be used for masked language modeling tasks using the `fill-mask` pipeline from Hugging Face. Below is an example:
69
+
70
+ ```python
71
+ from transformers import pipeline
72
+
73
+ mask_filler = pipeline("fill-mask", model="Prikshit7766/distilbert-finetuned-imdb-mlm-accelerate")
74
+
75
+ text = "This is a great [MASK]."
76
+ predictions = mask_filler(text)
77
+
78
+ for pred in predictions:
79
+ print(f">>> {pred['sequence']}")
80
+ ```
81
+
82
+ **Example Output:**
83
+
84
+ ```text
85
+ >>> This is a great movie.
86
+ >>> This is a great film.
87
+ >>> This is a great show.
88
+ >>> This is a great story.
89
+ >>> This is a great documentary.
90
+ ```