fernandogd97 commited on
Commit
0edaf39
verified
1 Parent(s): 5732aaa

Add model card

Browse files
Files changed (1) hide show
  1. README.md +69 -0
README.md ADDED
@@ -0,0 +1,69 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - es
5
+ base_model:
6
+ - PlanTL-GOB-ES/roberta-base-biomedical-clinical-es
7
+ tags:
8
+ - medical
9
+ - spanish
10
+ - bi-encoder
11
+ - entity-linking
12
+ - sapbert
13
+ - umls
14
+ - snomed-ct
15
+ ---
16
+
17
+ # **ClinLinker-KB-GP**
18
+
19
+ ## Model Description
20
+ ClinLinker-KB-GP is a state-of-the-art bi-encoder model for medical entity linking (MEL) in Spanish, optimized for clinical domain tasks. It enriches concept representations by incorporating not only synonyms but also hierarchical relationships (parents and grandparents) from the UMLS and SNOMED-CT ontologies. The model was trained with a contrastive-learning strategy using hard negative mining and multi-similarity loss.
21
+
22
+ ## Intended Use
23
+ - **Domain**: Spanish Clinical NLP
24
+ - **Tasks**: Entity linking (diseases, symptoms, procedures) to SNOMED-CT
25
+ - **Evaluated On**: DisTEMIST, MedProcNER, SympTEMIST
26
+ - **Users**: Researchers and practitioners working in clinical NLP
27
+
28
+ ## Performance Summary (Top-25 Accuracy)
29
+
30
+ | Model | DisTEMIST | MedProcNER | SympTEMIST |
31
+ |--------------------|-----------|------------|------------|
32
+ | ClinLinker | 0.845 | 0.898 | 0.909 |
33
+ | ClinLinker-KB-P | 0.853 | 0.891 | 0.918 |
34
+ | **ClinLinker-KB-GP** | **0.864** | **0.901** | **0.922** |
35
+ | SapBERT-XLM-R-large| 0.800 | 0.850 | 0.827 |
36
+ | RoBERTa biomedical | 0.600 | 0.668 | 0.609 |
37
+
38
+ *Results correspond to the cleaned gold-standard version (no "NO CODE" or "COMPOSITE").*
39
+
40
+ ## Usage
41
+
42
+ ```python
43
+ from transformers import AutoModel, AutoTokenizer
44
+ import torch
45
+
46
+ model = AutoModel.from_pretrained("ICB-UMA/ClinLinker-KB-GP")
47
+ tokenizer = AutoTokenizer.from_pretrained("ICB-UMA/ClinLinker-KB-GP")
48
+
49
+ mention = "insuficiencia renal aguda"
50
+ inputs = tokenizer(mention, return_tensors="pt")
51
+ with torch.no_grad():
52
+ outputs = model(**inputs)
53
+ embedding = outputs.last_hidden_state[:, 0, :]
54
+ print(embedding.shape)
55
+ ```
56
+
57
+ For scalable retrieval, use [Faiss](https://github.com/facebookresearch/faiss) or the [`FaissEncoder`](https://github.com/ICB-UMA/KnowledgeGraph) class.
58
+
59
+ ## Limitations
60
+ - The model is optimized for Spanish clinical data and may underperform outside this domain.
61
+ - Expert validation is advised in critical applications.
62
+
63
+ ## Citation
64
+
65
+ > Gallego, Fernando and L贸pez-Garc铆a, Guillermo and Gasco, Luis and Krallinger, Martin and Veredas, Francisco J., Clinlinker-Kb: Clinical Entity Linking in Spanish with Knowledge-Graph Enhanced Biencoders. Available at SSRN:http://dx.doi.org/10.2139/ssrn.4939986.
66
+
67
+ ## Authors
68
+
69
+ Fernando Gallego, Guillermo L贸pez-Garc铆a, Luis Gasco-S谩nchez, Martin Krallinger, Francisco J Veredas