|
--- |
|
license: cc-by-nc-4.0 |
|
language: |
|
- es |
|
tags: |
|
- simplification |
|
- NER |
|
--- |
|
|
|
This is a model for **complex word identification (CWI)** of Spanish medical texts, based on the |
|
[multilingual DeBERTa vs 3 (mDeBERTa)](https://huggingface.co/microsoft/mdeberta-v3-base). |
|
|
|
The model was fine-tuned on a corpus of 225 texts for patients (162575 tokens) to identify **complex words** (**CW**). |
|
|
|
**Results (test set)** |
|
|
|
| Class | Precision | Recall | F1 | Accuracy | |
|
|:-----:|:-------------:|:-------------:|:-------------:|:-------------:| |
|
| CW | 79.05 (±1.39) | 79.01 (±0.70) | 79.02 (±0.65) | 94.86 (±0.22) | |
|
|
|
*Results are the average of 3 experimental rounds. |
|
|
|
If you use this model or want to have more details about the experiments and the training details, take a look at our article: |
|
|
|
``` |
|
@article{2025CWI, |
|
title={Complex Word Identification for Lexical Simplification in Spanish Texts for Patients}, |
|
author={Ortega-Riba, Federico and Campillos-Llanos, Leonardo and Samy, Doaa}, |
|
journal={Procesamiento del lenguaje natural}, |
|
volume={74}, |
|
year={2025} |
|
} |
|
``` |
|
|
|
|
|
|