DisTEMIST-bi-encoder

Model Description

DisTEMIST-bi-encoder is a domain-specific bi-encoder model for medical entity linking in Spanish, trained using synonym pairs from the DisTEMIST corpus and SNOMED-CT (Fully Specified Name and preferred synonyms). The training data was curated from the gold standard corpus and enriched with knowledge-based synonyms to enhance entity normalization tasks.

πŸ’‘ Intended Use

  • Domain: Spanish Clinical NLP
  • Tasks: Entity linking of DisTEMIST mentions to SNOMED-CT concepts
  • Evaluated On: DisTEMIST (Gold Standard, Unseen Mentions, Unseen Codes)
  • Users: Researchers and developers focusing on specialized medical NEL

πŸ’¬ Definitions

  • Unseen Mentions: Mentions that do not appear in training but reference known codes.
  • Unseen Codes: Mentions associated with SNOMED-CT codes never seen during training.

πŸ“ˆ Performance Summary (Top-25 Accuracy)

Evaluation Split Top-25 Accuracy
Gold Standard 0.903
Unseen Mentions 0.819
Unseen Codes 0.793

πŸ§ͺ Usage

from transformers import AutoModel, AutoTokenizer
import torch

model = AutoModel.from_pretrained("ICB-UMA/DisTEMIST-bi-encoder")
tokenizer = AutoTokenizer.from_pretrained("ICB-UMA/DisTEMIST-bi-encoder")

mention = "insuficiencia renal aguda"
inputs = tokenizer(mention, return_tensors="pt")
with torch.no_grad():
    outputs = model(**inputs)
embedding = outputs.last_hidden_state[:, 0, :]
print(embedding.shape)

Use with Faiss or FaissEncoder for efficient retrieval.

⚠️ Limitations

  • The model is specialized for DisTEMIST mentions and may underperform in other domains or corpora.
  • Expert supervision is advised for clinical deployment.

πŸ“š Citation

Gallego, Fernando and LΓ³pez-GarcΓ­a, Guillermo and Gasco, Luis and Krallinger, Martin and Veredas, Francisco J., Clinlinker-Kb: Clinical Entity Linking in Spanish with Knowledge-Graph Enhanced Biencoders. Available at SSRN: http://dx.doi.org/10.2139/ssrn.4939986

Authors

Fernando Gallego, Guillermo LΓ³pez-GarcΓ­a, Luis Gasco-SΓ‘nchez, Martin Krallinger, Francisco J Veredas

Downloads last month
2
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for ICB-UMA/DisTEMIST-bi-encoder

Finetuned
(8)
this model