janrodriguez commited on
Commit
9df842d
·
verified ·
1 Parent(s): 679e39f

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +61 -18
README.md CHANGED
@@ -8,7 +8,7 @@ tags:
8
  - spanish
9
  - drugs
10
  - medications
11
- license: apache-2.0
12
  metrics:
13
  - precision
14
  - recall
@@ -26,15 +26,30 @@ model-index:
26
  name: DrugTEMIST-es
27
  type: DrugTEMIST-es
28
  metrics:
29
- - name: precision
30
  type: precision
31
  value: 0.917
32
- - name: recall
33
  type: recall
34
  value: 0.909
35
- - name: f1
36
  type: f1
37
  value: 0.913
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
38
  widget:
39
  - text: El diagnóstico definitivo de nuestro paciente fue de un Adenocarcinoma de pulmón cT2a cN3 cM1a Estadio IV (por una única lesión pulmonar contralateral) PD-L1 90%, EGFR negativo, ALK negativo y ROS-1 negativo.
40
  - text: Durante el ingreso se realiza una TC, observándose un nódulo pulmonar en el LII y una masa renal derecha indeterminada. Se realiza punción biopsia del nódulo pulmonar, con hallazgos altamente sospechosos de carcinoma.
@@ -52,7 +67,6 @@ widget:
52
  - [Model description](#model-description)
53
  - [How to use](#how-to-use)
54
  - [Limitations and bias](#limitations-and-bias)
55
- - [Evaluation](#evaluation)
56
  - [Additional information](#additional-information)
57
  - [Authors](#authors)
58
  - [Contact information](#contact-information)
@@ -64,7 +78,7 @@ widget:
64
  </details>
65
 
66
  ## Model description
67
- A fine-tuned version of the [bsc-bio-ehr-es](https://huggingface.co/PlanTL-GOB-ES/bsc-bio-ehr-es) model on the [DrugTEMIST](https://zenodo.org/records/11368861) corpus (original Spanish Gold Standard).
68
 
69
  ## How to use
70
 
@@ -76,9 +90,6 @@ A usage example can be found [here](https://github.com/nlp4bia-bsc/hugging-face-
76
  At the time of submission, no measures have been taken to estimate the bias embedded in the model. However, we are well aware that our models may be biased since the corpora have been collected using crawling techniques on multiple web sources. We intend to conduct research in these areas in the future, and if completed, this model card will be updated.
77
 
78
 
79
- ## Evaluation
80
- F1 Score on DrugTEMIST-es: 0.913.
81
-
82
  ## Additional information
83
 
84
  ### Authors
@@ -87,11 +98,9 @@ NLP4BIA team at the Barcelona Supercomputing Center (nlp4bia@bsc.es).
87
  ### Contact information
88
  jan.rodriguez [at] bsc.es
89
 
90
- ### Licensing information
91
- [Apache License, Version 2.0](https://www.apache.org/licenses/LICENSE-2.0)
92
-
93
  ### Funding
94
- This research was funded by the Ministerio de Ciencia e Innovación (MICINN) under project AI4ProfHealth (PID2020-119266RA-I00 MICIU/AEI/10.13039/501100011033) and BARITONE (TED2021-129974B-C22). This work is also supported by the European Union’s Horizon Europe Co-ordination & Support Action under Grant Agreement No 101080430 (AI4HF) as well as Grant Agreement No 101057849 (DataTool4Heartproject).
 
95
 
96
  ### Citing information
97
 
@@ -99,11 +108,45 @@ Please cite the following works:
99
 
100
  ```bibtex
101
 
102
- @inproceedings{multicardioner2024overview, title = {{Overview of MultiCardioNER task at BioASQ 2024 on Medical Speciality and Language Adaptation of Clinical NER Systems for Spanish, English and Italian}}, author = {Salvador Lima-López and Eulàlia Farré-Maduell and Jan Rodríguez-Miret and Miguel Rodríguez-Ortega and Livia Lilli and Jacopo Lenkowicz and Giovanna Ceroni and Jonathan Kossoff and Anoop Shah and Anastasios Nentidis and Anastasia Krithara and Georgios Katsimpras and Georgios Paliouras and Martin Krallinger}, booktitle = {CLEF Working Notes}, year = {2024}, editor = {Faggioli, Guglielmo and Ferro, Nicola and Galuščáková, Petra and García Seco de Herrera, Alba} }
103
-
104
- @misc{carmen_physionet, author = {Farre Maduell, Eulalia and Lima-Lopez, Salvador and Frid, Santiago Andres and Conesa, Artur and Asensio, Elisa and Lopez-Rueda, Antonio and Arino, Helena and Calvo, Elena and Bertran, Maria Jesús and Marcos, Maria Angeles and Nofre Maiz, Montserrat and Tañá Velasco, Laura and Marti, Antonia and Farreres, Ricardo and Pastor, Xavier and Borrat Frigola, Xavier and Krallinger, Martin}, title = {{CARMEN-I: A resource of anonymized electronic health records in Spanish and Catalan for training and testing NLP tools (version 1.0.1)}}, year = {2024}, publisher = {PhysioNet}, url = {https://doi.org/10.13026/x7ed-9r91} }",
105
-
106
- @article{physionet, author = {Ary L. Goldberger and Luis A. N. Amaral and Leon Glass and Jeffrey M. Hausdorff and Plamen Ch. Ivanov and Roger G. Mark and Joseph E. Mietus and George B. Moody and Chung-Kang Peng and H. Eugene Stanley }, title = {PhysioBank, PhysioToolkit, and PhysioNet }, journal = {Circulation}, volume = {101}, number = {23}, pages = {e215-e220}, year = {2000}, doi = {10.1161/01.CIR.101.23.e215}, URL = {https://www.ahajournals.org/doi/abs/10.1161/01.CIR.101.23.e215} }"
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
107
  ```
108
 
109
  ### Disclaimer
 
8
  - spanish
9
  - drugs
10
  - medications
11
+ license: cc-by-4.0
12
  metrics:
13
  - precision
14
  - recall
 
26
  name: DrugTEMIST-es
27
  type: DrugTEMIST-es
28
  metrics:
29
+ - name: precision (micro)
30
  type: precision
31
  value: 0.917
32
+ - name: recall (micro)
33
  type: recall
34
  value: 0.909
35
+ - name: f1 (micro)
36
  type: f1
37
  value: 0.913
38
+ - task:
39
+ type: token-classification
40
+ dataset:
41
+ name: CARMEN-I-medications
42
+ type: CARMEN-I-medications
43
+ metrics:
44
+ - name: precision (micro)
45
+ type: precision
46
+ value: 0.906
47
+ - name: recall (micro)
48
+ type: recall
49
+ value: 0.885
50
+ - name: f1 (micro)
51
+ type: f1
52
+ value: 0.895
53
  widget:
54
  - text: El diagnóstico definitivo de nuestro paciente fue de un Adenocarcinoma de pulmón cT2a cN3 cM1a Estadio IV (por una única lesión pulmonar contralateral) PD-L1 90%, EGFR negativo, ALK negativo y ROS-1 negativo.
55
  - text: Durante el ingreso se realiza una TC, observándose un nódulo pulmonar en el LII y una masa renal derecha indeterminada. Se realiza punción biopsia del nódulo pulmonar, con hallazgos altamente sospechosos de carcinoma.
 
67
  - [Model description](#model-description)
68
  - [How to use](#how-to-use)
69
  - [Limitations and bias](#limitations-and-bias)
 
70
  - [Additional information](#additional-information)
71
  - [Authors](#authors)
72
  - [Contact information](#contact-information)
 
78
  </details>
79
 
80
  ## Model description
81
+ A fine-tuned version of the [bsc-bio-ehr-es](https://huggingface.co/PlanTL-GOB-ES/bsc-bio-ehr-es) model on the [DrugTEMIST](https://zenodo.org/records/11368861) corpus (original Spanish Gold Standard). For further information, check the [official website](https://temu.bsc.es/multicardioner/)
82
 
83
  ## How to use
84
 
 
90
  At the time of submission, no measures have been taken to estimate the bias embedded in the model. However, we are well aware that our models may be biased since the corpora have been collected using crawling techniques on multiple web sources. We intend to conduct research in these areas in the future, and if completed, this model card will be updated.
91
 
92
 
 
 
 
93
  ## Additional information
94
 
95
  ### Authors
 
98
  ### Contact information
99
  jan.rodriguez [at] bsc.es
100
 
 
 
 
101
  ### Funding
102
+
103
+ This project was partially funded by the Spanish Plan for the Advancement of Language Technology (Plan TL) in collaboration with the Barcelona Supercomputing Center (BSC) and the Hospital Clinic de Barcelona (HCB). On the BSC's side, we acknowledge additional funding by the Spanish National AI4ProfHealth project (PID2020-119266RA-I00 MICIU/AEI/10.13039/501100011033) and EU Horizon projects (AI4HF 101080430 and DataTools4Heart 101057849). On the HCB's side, the project was also supported by the Instituto de Salud Carlos III (ISCIII).
104
 
105
  ### Citing information
106
 
 
108
 
109
  ```bibtex
110
 
111
+ @article{LimaLopez2025,
112
+ author = {Salvador Lima-López and Eulàlia Farré-Maduell and Luis Gasco and Jan Rodríguez-Miret and Santiago Frid and Xavier Pastor and Xavier Borrat and Martin Krallinger},
113
+ title = {A textual dataset of de-identified health records in Spanish and Catalan for medical entity recognition and anonymization},
114
+ journal = {Scientific Data},
115
+ volume = {12},
116
+ pages = {Article 1088},
117
+ year = {2025},
118
+ publisher = {Nature Publishing Group},
119
+ doi = {10.1038/s41597-025-05320-1},
120
+ url = {https://www.nature.com/articles/s41597-025-05320-1}
121
+ }
122
+
123
+ @misc{carmen_physionet,
124
+ author = {Farre Maduell, Eulalia and Lima-Lopez, Salvador and Frid, Santiago Andres and Conesa, Artur and Asensio, Elisa and Lopez-Rueda, Antonio and Arino, Helena and Calvo, Elena and Bertran, Maria Jesús and Marcos, Maria Angeles and Nofre Maiz, Montserrat and Tañá Velasco, Laura and Marti, Antonia and Farreres, Ricardo and Pastor, Xavier and Borrat Frigola, Xavier and Krallinger, Martin},
125
+ title = {{CARMEN-I: A resource of anonymized electronic health records in Spanish and Catalan for training and testing NLP tools (version 1.0.1)}},
126
+ year = {2024},
127
+ publisher = {PhysioNet},
128
+ url = {https://doi.org/10.13026/x7ed-9r91}
129
+ },
130
+
131
+ @inproceedings{multicardioner2024overview,
132
+ title = {{Overview of MultiCardioNER task at BioASQ 2024 on Medical Speciality and Language Adaptation of Clinical NER Systems for Spanish, English and Italian}},
133
+ author = {Salvador Lima-L\'opez and Eul\`alia Farr\'e-Maduell and Jan Rodr\'iguez-Miret and Miguel Rodr\'iguez-Ortega and Livia Lilli and Jacopo Lenkowicz and Giovanna Ceroni and Jonathan Kossoff and Anoop Shah and Anastasios Nentidis and Anastasia Krithara and Georgios Katsimpras and Georgios Paliouras and Martin Krallinger},
134
+ booktitle = {CLEF Working Notes},
135
+ year = {2024},
136
+ editor = {Faggioli, Guglielmo and Ferro, Nicola and Galušč\'akov\'a, Petra and Garc\'ia Seco de Herrera, Alba}
137
+ }
138
+
139
+ @article{physionet,
140
+ author = {Ary L. Goldberger and Luis A. N. Amaral and Leon Glass and Jeffrey M. Hausdorff and Plamen Ch. Ivanov and Roger G. Mark and Joseph E. Mietus and George B. Moody and Chung-Kang Peng and H. Eugene Stanley },
141
+ title = {PhysioBank, PhysioToolkit, and PhysioNet },
142
+ journal = {Circulation},
143
+ volume = {101},
144
+ number = {23},
145
+ pages = {e215-e220},
146
+ year = {2000},
147
+ doi = {10.1161/01.CIR.101.23.e215},
148
+ URL = {https://www.ahajournals.org/doi/abs/10.1161/01.CIR.101.23.e215}
149
+ }
150
  ```
151
 
152
  ### Disclaimer