Update README.md
Browse files
README.md
CHANGED
@@ -104,7 +104,7 @@ language:
|
|
104 |
|
105 |
# Ara-EuroBERT: Arabic Semantic Text Embeddings
|
106 |
|
107 |
-
<img src="https://i.ibb.co/d4svDscP/Clear-Familiar-situations-that-you-already-have-best-practices-for-4.png" width="
|
108 |
|
109 |
Ara-EuroBERT is a [sentence-transformers](https://www.SBERT.net) model fine-tuned from [EuroBERT/EuroBERT-610m](https://huggingface.co/EuroBERT/EuroBERT-610m) specifically optimized for **Semantic Arabic text embeddings**.
|
110 |
|
@@ -115,11 +115,28 @@ It can be used for semantic textual similarity, semantic search, paraphrase mini
|
|
115 |
|
116 |
<br clear="left"/>
|
117 |
|
118 |
-
## Model Details
|
119 |
|
120 |
<img src="https://cdn-uploads.huggingface.co/production/uploads/628f7a71dd993507cfcbe587/Kv78q7NmI3hhOXkRv30s9.png" width="1000" align="center"/>
|
121 |
|
|
|
122 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
123 |
|
124 |
|
125 |
### Model Description
|
|
|
104 |
|
105 |
# Ara-EuroBERT: Arabic Semantic Text Embeddings
|
106 |
|
107 |
+
<img src="https://i.ibb.co/d4svDscP/Clear-Familiar-situations-that-you-already-have-best-practices-for-4.png" width="150" align="left"/>
|
108 |
|
109 |
Ara-EuroBERT is a [sentence-transformers](https://www.SBERT.net) model fine-tuned from [EuroBERT/EuroBERT-610m](https://huggingface.co/EuroBERT/EuroBERT-610m) specifically optimized for **Semantic Arabic text embeddings**.
|
110 |
|
|
|
115 |
|
116 |
<br clear="left"/>
|
117 |
|
118 |
+
## Model Details & Benchmark Performance
|
119 |
|
120 |
<img src="https://cdn-uploads.huggingface.co/production/uploads/628f7a71dd993507cfcbe587/Kv78q7NmI3hhOXkRv30s9.png" width="1000" align="center"/>
|
121 |
|
122 |
+
The benchmark results above demonstrate the significant performance improvements of AraEuroBERT models compared to standard EuroBERT models:
|
123 |
|
124 |
+
- **STS17 Benchmark**: AraEuroBERT-610M achieves a score of 83, significantly outperforming the standard EuroBERT-610M (14) and even the much larger EuroBERT-2.1B (12).
|
125 |
+
- **STS22.v2 Benchmark**: AraEuroBERT-210M scores 61, outperforming both the larger AraEuroBERT-610M (53) and all standard EuroBERT variants.
|
126 |
+
|
127 |
+
These results highlight the effectiveness of our specialized fine-tuning for Arabic text embeddings, with even our smaller 210M parameter model demonstrating superior performance on Arabic semantic tasks.
|
128 |
+
|
129 |
+
### Metrics
|
130 |
+
|
131 |
+
#### Semantic Similarity
|
132 |
+
|
133 |
+
* Datasets: `sts-dev-1152`, `sts-dev-960`, `sts-dev-768` and `sts-dev-512`
|
134 |
+
* Evaluated with [<code>EmbeddingSimilarityEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.EmbeddingSimilarityEvaluator)
|
135 |
+
|
136 |
+
| Metric | sts-dev-1152 | sts-dev-960 | sts-dev-768 | sts-dev-512 |
|
137 |
+
|:--------------------|:-------------|:------------|:------------|:------------|
|
138 |
+
| pearson_cosine | 0.8264 | 0.8259 | 0.8244 | 0.8238 |
|
139 |
+
| **spearman_cosine** | **0.8307** | **0.8302** | **0.8293** | **0.8293** |
|
140 |
|
141 |
|
142 |
### Model Description
|