Add model card
Browse files
README.md
ADDED
@@ -0,0 +1,69 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: apache-2.0
|
3 |
+
language:
|
4 |
+
- es
|
5 |
+
base_model:
|
6 |
+
- PlanTL-GOB-ES/roberta-base-biomedical-clinical-es
|
7 |
+
tags:
|
8 |
+
- medical
|
9 |
+
- spanish
|
10 |
+
- bi-encoder
|
11 |
+
- entity-linking
|
12 |
+
- sapbert
|
13 |
+
- umls
|
14 |
+
- snomed-ct
|
15 |
+
---
|
16 |
+
|
17 |
+
# **ClinLinker-KB-GP**
|
18 |
+
|
19 |
+
## Model Description
|
20 |
+
ClinLinker-KB-GP is a state-of-the-art bi-encoder model for medical entity linking (MEL) in Spanish, optimized for clinical domain tasks. It enriches concept representations by incorporating not only synonyms but also hierarchical relationships (parents and grandparents) from the UMLS and SNOMED-CT ontologies. The model was trained with a contrastive-learning strategy using hard negative mining and multi-similarity loss.
|
21 |
+
|
22 |
+
## Intended Use
|
23 |
+
- **Domain**: Spanish Clinical NLP
|
24 |
+
- **Tasks**: Entity linking (diseases, symptoms, procedures) to SNOMED-CT
|
25 |
+
- **Evaluated On**: DisTEMIST, MedProcNER, SympTEMIST
|
26 |
+
- **Users**: Researchers and practitioners working in clinical NLP
|
27 |
+
|
28 |
+
## Performance Summary (Top-25 Accuracy)
|
29 |
+
|
30 |
+
| Model | DisTEMIST | MedProcNER | SympTEMIST |
|
31 |
+
|--------------------|-----------|------------|------------|
|
32 |
+
| ClinLinker | 0.845 | 0.898 | 0.909 |
|
33 |
+
| ClinLinker-KB-P | 0.853 | 0.891 | 0.918 |
|
34 |
+
| **ClinLinker-KB-GP** | **0.864** | **0.901** | **0.922** |
|
35 |
+
| SapBERT-XLM-R-large| 0.800 | 0.850 | 0.827 |
|
36 |
+
| RoBERTa biomedical | 0.600 | 0.668 | 0.609 |
|
37 |
+
|
38 |
+
*Results correspond to the cleaned gold-standard version (no "NO CODE" or "COMPOSITE").*
|
39 |
+
|
40 |
+
## Usage
|
41 |
+
|
42 |
+
```python
|
43 |
+
from transformers import AutoModel, AutoTokenizer
|
44 |
+
import torch
|
45 |
+
|
46 |
+
model = AutoModel.from_pretrained("ICB-UMA/ClinLinker-KB-GP")
|
47 |
+
tokenizer = AutoTokenizer.from_pretrained("ICB-UMA/ClinLinker-KB-GP")
|
48 |
+
|
49 |
+
mention = "insuficiencia renal aguda"
|
50 |
+
inputs = tokenizer(mention, return_tensors="pt")
|
51 |
+
with torch.no_grad():
|
52 |
+
outputs = model(**inputs)
|
53 |
+
embedding = outputs.last_hidden_state[:, 0, :]
|
54 |
+
print(embedding.shape)
|
55 |
+
```
|
56 |
+
|
57 |
+
For scalable retrieval, use [Faiss](https://github.com/facebookresearch/faiss) or the [`FaissEncoder`](https://github.com/ICB-UMA/KnowledgeGraph) class.
|
58 |
+
|
59 |
+
## Limitations
|
60 |
+
- The model is optimized for Spanish clinical data and may underperform outside this domain.
|
61 |
+
- Expert validation is advised in critical applications.
|
62 |
+
|
63 |
+
## Citation
|
64 |
+
|
65 |
+
> Gallego, Fernando and L贸pez-Garc铆a, Guillermo and Gasco, Luis and Krallinger, Martin and Veredas, Francisco J., Clinlinker-Kb: Clinical Entity Linking in Spanish with Knowledge-Graph Enhanced Biencoders. Available at SSRN:http://dx.doi.org/10.2139/ssrn.4939986.
|
66 |
+
|
67 |
+
## Authors
|
68 |
+
|
69 |
+
Fernando Gallego, Guillermo L贸pez-Garc铆a, Luis Gasco-S谩nchez, Martin Krallinger, Francisco J Veredas
|