DisTEMIST-bi-encoder
Model Description
DisTEMIST-bi-encoder is a domain-specific bi-encoder model for medical entity linking in Spanish, trained using synonym pairs from the DisTEMIST corpus and SNOMED-CT (Fully Specified Name and preferred synonyms). The training data was curated from the gold standard corpus and enriched with knowledge-based synonyms to enhance entity normalization tasks.
π‘ Intended Use
- Domain: Spanish Clinical NLP
- Tasks: Entity linking of DisTEMIST mentions to SNOMED-CT concepts
- Evaluated On: DisTEMIST (Gold Standard, Unseen Mentions, Unseen Codes)
- Users: Researchers and developers focusing on specialized medical NEL
π¬ Definitions
- Unseen Mentions: Mentions that do not appear in training but reference known codes.
- Unseen Codes: Mentions associated with SNOMED-CT codes never seen during training.
π Performance Summary (Top-25 Accuracy)
Evaluation Split | Top-25 Accuracy |
---|---|
Gold Standard | 0.903 |
Unseen Mentions | 0.819 |
Unseen Codes | 0.793 |
π§ͺ Usage
from transformers import AutoModel, AutoTokenizer
import torch
model = AutoModel.from_pretrained("ICB-UMA/DisTEMIST-bi-encoder")
tokenizer = AutoTokenizer.from_pretrained("ICB-UMA/DisTEMIST-bi-encoder")
mention = "insuficiencia renal aguda"
inputs = tokenizer(mention, return_tensors="pt")
with torch.no_grad():
outputs = model(**inputs)
embedding = outputs.last_hidden_state[:, 0, :]
print(embedding.shape)
Use with Faiss or FaissEncoder
for efficient retrieval.
β οΈ Limitations
- The model is specialized for DisTEMIST mentions and may underperform in other domains or corpora.
- Expert supervision is advised for clinical deployment.
π Citation
Gallego, Fernando and LΓ³pez-GarcΓa, Guillermo and Gasco, Luis and Krallinger, Martin and Veredas, Francisco J., Clinlinker-Kb: Clinical Entity Linking in Spanish with Knowledge-Graph Enhanced Biencoders. Available at SSRN: http://dx.doi.org/10.2139/ssrn.4939986
Authors
Fernando Gallego, Guillermo LΓ³pez-GarcΓa, Luis Gasco-SΓ‘nchez, Martin Krallinger, Francisco J Veredas
- Downloads last month
- 2