Upload folder using huggingface_hub
Browse files- README.md +140 -3
- data-config.json +17 -0
- data-load-args.json +10 -0
- eval-metrics_test.json +5 -0
- eval-metrics_train.json +5 -0
- eval-metrics_validation.json +5 -0
- input-data.hf/data-00000-of-00001.arrow +3 -0
- input-data.hf/dataset_info.json +52 -0
- input-data.hf/state.json +13 -0
- logs-csv/lightning_logs/version_0/hparams.yaml +1 -0
- logs-csv/lightning_logs/version_0/metrics.csv +91 -0
- logs/lightning_logs/version_0/events.out.tfevents.1743097353.cn026.2190138.0 +3 -0
- logs/lightning_logs/version_0/hparams.yaml +1 -0
- metrics.csv +4 -0
- modelbox-config.json +11 -0
- params.pt +3 -0
- predictions_test.csv.gz +3 -0
- predictions_train.csv.gz +3 -0
- predictions_validation.csv.gz +3 -0
- training-args.json +5 -0
- training-data.hf/cache-d4aeece68b087032.arrow +3 -0
- training-data.hf/data-00000-of-00001.arrow +3 -0
- training-data.hf/dataset_info.json +126 -0
- training-data.hf/state.json +15 -0
- training-log.csv +46 -0
- training-log.png +0 -0
README.md
CHANGED
@@ -1,3 +1,140 @@
|
|
1 |
-
---
|
2 |
-
license: mit
|
3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: mit
|
3 |
+
pipeline_tag: tabular-regression
|
4 |
+
tags:
|
5 |
+
- chemistry
|
6 |
+
- microbiology
|
7 |
+
- antibiotics
|
8 |
+
library_name: duvida
|
9 |
+
datasets:
|
10 |
+
- scbirlab/thomas-2018-spark-wt
|
11 |
+
---
|
12 |
+
|
13 |
+
# Predictor of _Klebsiella pneumoniae_ MICs
|
14 |
+
|
15 |
+
_Updated:_ Fri 28 Mar 14:27:11 GMT 2025
|
16 |
+
|
17 |
+
Trained on the _Klebsiella pneumoniae_, WT accumulator phenotype subset of the [human-curated SPARK dataset](https://doi.org/10.1021/acsinfecdis.8b00193) (3920 rows in total for _Klebsiella pneumoniae_).
|
18 |
+
|
19 |
+
## Model details
|
20 |
+
|
21 |
+
This model was trained using [our Duvida framework](https://github.com/scbirlab/duvida),
|
22 |
+
as a result of hyperparameter searches and selecting the model that performs best on unseen test data
|
23 |
+
(from a scaffold split).
|
24 |
+
|
25 |
+
Duvida also saves the training data in this checkpoint to allows the calculation of uncertainty metrics
|
26 |
+
based on that training data.
|
27 |
+
|
28 |
+
This model is the best regression model from a hyperparameter search, determined
|
29 |
+
by Spearman's $\rho$ on a held-out test set not used in training or early stopping.
|
30 |
+
|
31 |
+
### Model architecture
|
32 |
+
|
33 |
+
- **Regression**
|
34 |
+
|
35 |
+
```json
|
36 |
+
|
37 |
+
{
|
38 |
+
"dropout": 0.2,
|
39 |
+
"ensemble_size": 10,
|
40 |
+
"extra_featurizers": null,
|
41 |
+
"learning_rate": 0.0001,
|
42 |
+
"model_class": "ChempropModelBox",
|
43 |
+
"n_hidden": 3,
|
44 |
+
"n_units": 16,
|
45 |
+
"use_2d": true,
|
46 |
+
"use_fp": true
|
47 |
+
}
|
48 |
+
```
|
49 |
+
|
50 |
+
### Model usage
|
51 |
+
|
52 |
+
You can use this model with:
|
53 |
+
|
54 |
+
```python
|
55 |
+
from duvida.autoclasses import AutoModelBox
|
56 |
+
modelbox = AutoModelBox.from_pretrained("hf://scbirlab/spark-dv-2503-kpne")
|
57 |
+
modelbox.predict(filename=..., inputs=[...], columns=[...]) # make predictions on your own data
|
58 |
+
```
|
59 |
+
|
60 |
+
## Training details
|
61 |
+
|
62 |
+
- **Dataset:** [SPARK, WT accumulator, _Klebsiella pneumoniae_ subset](https://huggingface.co/datasets/scbirlab/thomas-2018-spark-wt)
|
63 |
+
- **Input column:** smiles
|
64 |
+
- **Output column:** pmic
|
65 |
+
- **Split type:** Murcko scaffold
|
66 |
+
- **Split proportions:**
|
67 |
+
- 70% training (2045 rows)
|
68 |
+
- 15% validation (for early stopping) (723 rows)
|
69 |
+
- 15% test (for selecting hyperparameters) (646 rows)
|
70 |
+
|
71 |
+
Here is the training log:
|
72 |
+
|
73 |
+
<img src="training-log.png" width=450>
|
74 |
+
|
75 |
+
And these are the evaluation scores.
|
76 |
+
|
77 |
+
Train (2045 rows):
|
78 |
+
|
79 |
+
```json
|
80 |
+
|
81 |
+
{
|
82 |
+
"Pearson r": 0.879788014533255,
|
83 |
+
"RMSE": 0.4014032185077667,
|
84 |
+
"Spearman rho": 0.8235991116907959
|
85 |
+
}
|
86 |
+
```
|
87 |
+
|
88 |
+
Validation (723 rows):
|
89 |
+
|
90 |
+
```json
|
91 |
+
|
92 |
+
{
|
93 |
+
"Pearson r": 0.7805225413538466,
|
94 |
+
"RMSE": 0.7095186710357666,
|
95 |
+
"Spearman rho": 0.6299348550927065
|
96 |
+
}
|
97 |
+
```
|
98 |
+
|
99 |
+
|
100 |
+
Test (646 rows):
|
101 |
+
|
102 |
+
```json
|
103 |
+
|
104 |
+
{
|
105 |
+
"Pearson r": 0.4050551318825592,
|
106 |
+
"RMSE": 0.6779211163520813,
|
107 |
+
"Spearman rho": 0.4843227707887753
|
108 |
+
}
|
109 |
+
```
|
110 |
+
|
111 |
+
## Training data details
|
112 |
+
|
113 |
+
The training data were collated by the authors of:
|
114 |
+
|
115 |
+
> Joe Thomas, Marc Navre, Aileen Rubio, and Allan Coukell
|
116 |
+
> Shared Platform for Antibiotic Research and Knowledge: A Collaborative Tool to SPARK Antibiotic Discovery
|
117 |
+
> ACS Infectious Diseases 2018 4 (11), 1536-1539
|
118 |
+
> DOI: 10.1021/acsinfecdis.8b00193
|
119 |
+
|
120 |
+
We cleaned the original SPARK dataset to subset the most relevant columns, remove empty values,
|
121 |
+
give succint column titles, and split by species.
|
122 |
+
|
123 |
+
This particular dataset retains only measurements on bacteria with wild-type accumulation phenotypes.
|
124 |
+
|
125 |
+
### Dataset Sources
|
126 |
+
|
127 |
+
- **Repository:** https://www.collaborativedrug.com/spark-data-downloads
|
128 |
+
- **Paper:** https://doi.org/10.1021/acsinfecdis.8b00193
|
129 |
+
|
130 |
+
### Data Collection and Processing
|
131 |
+
|
132 |
+
Data were processed using [schemist](https://github.com/scbirlab/schemist), a tool for processing chemical datasets.
|
133 |
+
|
134 |
+
The SMILES strings have been canonicalized, and split into training (70%), validation (15%), and test (15%) sets
|
135 |
+
by Murcko scaffold for each species with more than 1000 entries. Additional features like molecular weight and
|
136 |
+
topological polar surface area have also been calculated.
|
137 |
+
|
138 |
+
### Who are the source data producers?
|
139 |
+
|
140 |
+
Joe Thomas, Marc Navre, Aileen Rubio, and Allan Coukell
|
data-config.json
ADDED
@@ -0,0 +1,17 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"_default_cache": "cache/duvida/data",
|
3 |
+
"_in_key": "inputs",
|
4 |
+
"_input_cols": [
|
5 |
+
"smiles"
|
6 |
+
],
|
7 |
+
"_label_cols": [
|
8 |
+
"pmic"
|
9 |
+
],
|
10 |
+
"_out_key": "labels",
|
11 |
+
"input_shape": [
|
12 |
+
2248
|
13 |
+
],
|
14 |
+
"output_shape": [
|
15 |
+
1
|
16 |
+
]
|
17 |
+
}
|
data-load-args.json
ADDED
@@ -0,0 +1,10 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"cache": "/nemo/lab/johnsone/home/users/johnsoe/projects/abx-discovery-strategy/models/spark/Klebsiella-pneumoniae/61/cache",
|
3 |
+
"features": [
|
4 |
+
"smiles"
|
5 |
+
],
|
6 |
+
"filename": "/nemo/lab/johnsone/home/users/johnsoe/data/datasets/thomas-2018-spark-wt/Klebsiella-pneumoniae/scaffold-split-train.csv.gz",
|
7 |
+
"labels": [
|
8 |
+
"pmic"
|
9 |
+
]
|
10 |
+
}
|
eval-metrics_test.json
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"Pearson r": 0.4050551318825592,
|
3 |
+
"RMSE": 0.6779211163520813,
|
4 |
+
"Spearman rho": 0.4843227707887753
|
5 |
+
}
|
eval-metrics_train.json
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"Pearson r": 0.879788014533255,
|
3 |
+
"RMSE": 0.4014032185077667,
|
4 |
+
"Spearman rho": 0.8235991116907959
|
5 |
+
}
|
eval-metrics_validation.json
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"Pearson r": 0.7805225413538466,
|
3 |
+
"RMSE": 0.7095186710357666,
|
4 |
+
"Spearman rho": 0.6299348550927065
|
5 |
+
}
|
input-data.hf/data-00000-of-00001.arrow
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:eb7f5cd91df22834ff43e89843609a6075e5f26ec1e7aebc9a9a05518a56ce3a
|
3 |
+
size 264496
|
input-data.hf/dataset_info.json
ADDED
@@ -0,0 +1,52 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"builder_name": "csv",
|
3 |
+
"citation": "",
|
4 |
+
"config_name": "default",
|
5 |
+
"dataset_name": "csv",
|
6 |
+
"dataset_size": 766413,
|
7 |
+
"description": "",
|
8 |
+
"download_checksums": {
|
9 |
+
"/nemo/lab/johnsone/home/users/johnsoe/data/datasets/thomas-2018-spark-wt/Klebsiella-pneumoniae/scaffold-split-train.csv.gz": {
|
10 |
+
"num_bytes": 130202,
|
11 |
+
"checksum": null
|
12 |
+
}
|
13 |
+
},
|
14 |
+
"download_size": 130202,
|
15 |
+
"features": {
|
16 |
+
"smiles": {
|
17 |
+
"dtype": "string",
|
18 |
+
"_type": "Value"
|
19 |
+
},
|
20 |
+
"inputs": {
|
21 |
+
"feature": {
|
22 |
+
"dtype": "string",
|
23 |
+
"_type": "Value"
|
24 |
+
},
|
25 |
+
"_type": "Sequence"
|
26 |
+
},
|
27 |
+
"labels": {
|
28 |
+
"feature": {
|
29 |
+
"dtype": "float64",
|
30 |
+
"_type": "Value"
|
31 |
+
},
|
32 |
+
"_type": "Sequence"
|
33 |
+
}
|
34 |
+
},
|
35 |
+
"homepage": "",
|
36 |
+
"license": "",
|
37 |
+
"size_in_bytes": 896615,
|
38 |
+
"splits": {
|
39 |
+
"train": {
|
40 |
+
"name": "train",
|
41 |
+
"num_bytes": 766413,
|
42 |
+
"num_examples": 2045,
|
43 |
+
"dataset_name": "csv"
|
44 |
+
}
|
45 |
+
},
|
46 |
+
"version": {
|
47 |
+
"version_str": "0.0.0",
|
48 |
+
"major": 0,
|
49 |
+
"minor": 0,
|
50 |
+
"patch": 0
|
51 |
+
}
|
52 |
+
}
|
input-data.hf/state.json
ADDED
@@ -0,0 +1,13 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"_data_files": [
|
3 |
+
{
|
4 |
+
"filename": "data-00000-of-00001.arrow"
|
5 |
+
}
|
6 |
+
],
|
7 |
+
"_fingerprint": "a55666bc481927f9",
|
8 |
+
"_format_columns": null,
|
9 |
+
"_format_kwargs": {},
|
10 |
+
"_format_type": null,
|
11 |
+
"_output_all_columns": false,
|
12 |
+
"_split": "train"
|
13 |
+
}
|
logs-csv/lightning_logs/version_0/hparams.yaml
ADDED
@@ -0,0 +1 @@
|
|
|
|
|
1 |
+
{}
|
logs-csv/lightning_logs/version_0/metrics.csv
ADDED
@@ -0,0 +1,91 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
epoch,loss,step,val_loss
|
2 |
+
0,,127,3.1612131595611572
|
3 |
+
0,11.470685958862305,127,
|
4 |
+
1,,255,3.4212403297424316
|
5 |
+
1,4.516500473022461,255,
|
6 |
+
2,,383,2.7988340854644775
|
7 |
+
2,4.001670837402344,383,
|
8 |
+
3,,511,2.4266252517700195
|
9 |
+
3,3.4878146648406982,511,
|
10 |
+
4,,639,2.3812172412872314
|
11 |
+
4,3.1538383960723877,639,
|
12 |
+
5,,767,2.249669075012207
|
13 |
+
5,2.8869502544403076,767,
|
14 |
+
6,,895,2.1487362384796143
|
15 |
+
6,2.6796183586120605,895,
|
16 |
+
7,,1023,1.9302494525909424
|
17 |
+
7,2.499631881713867,1023,
|
18 |
+
8,,1151,1.895418405532837
|
19 |
+
8,2.384538173675537,1151,
|
20 |
+
9,,1279,1.8401507139205933
|
21 |
+
9,2.2424700260162354,1279,
|
22 |
+
10,,1407,1.7240188121795654
|
23 |
+
10,2.1825485229492188,1407,
|
24 |
+
11,,1535,1.6076897382736206
|
25 |
+
11,2.0599634647369385,1535,
|
26 |
+
12,,1663,1.5228654146194458
|
27 |
+
12,1.9549221992492676,1663,
|
28 |
+
13,,1791,1.3301759958267212
|
29 |
+
13,1.8475960493087769,1791,
|
30 |
+
14,,1919,1.197400689125061
|
31 |
+
14,1.7222977876663208,1919,
|
32 |
+
15,,2047,1.1432061195373535
|
33 |
+
15,1.662263035774231,2047,
|
34 |
+
16,,2175,1.1259466409683228
|
35 |
+
16,1.5466835498809814,2175,
|
36 |
+
17,,2303,1.133188247680664
|
37 |
+
17,1.476884126663208,2303,
|
38 |
+
18,,2431,1.1027441024780273
|
39 |
+
18,1.4452625513076782,2431,
|
40 |
+
19,,2559,1.158228874206543
|
41 |
+
19,1.378151774406433,2559,
|
42 |
+
20,,2687,1.0784740447998047
|
43 |
+
20,1.318475365638733,2687,
|
44 |
+
21,,2815,1.0715962648391724
|
45 |
+
21,1.2622106075286865,2815,
|
46 |
+
22,,2943,1.075689673423767
|
47 |
+
22,1.2291003465652466,2943,
|
48 |
+
23,,3071,1.0587623119354248
|
49 |
+
23,1.1971018314361572,3071,
|
50 |
+
24,,3199,1.0540975332260132
|
51 |
+
24,1.1468651294708252,3199,
|
52 |
+
25,,3327,1.07569420337677
|
53 |
+
25,1.1313549280166626,3327,
|
54 |
+
26,,3455,1.0563280582427979
|
55 |
+
26,1.1012349128723145,3455,
|
56 |
+
27,,3583,1.1024538278579712
|
57 |
+
27,1.0781406164169312,3583,
|
58 |
+
28,,3711,1.0793583393096924
|
59 |
+
28,1.058035135269165,3711,
|
60 |
+
29,,3839,1.0855896472930908
|
61 |
+
29,1.0287179946899414,3839,
|
62 |
+
30,,3967,1.1220470666885376
|
63 |
+
30,1.0105702877044678,3967,
|
64 |
+
31,,4095,1.0875941514968872
|
65 |
+
31,0.9984549880027771,4095,
|
66 |
+
32,,4223,1.0672334432601929
|
67 |
+
32,0.987288236618042,4223,
|
68 |
+
33,,4351,1.0616916418075562
|
69 |
+
33,0.9705791473388672,4351,
|
70 |
+
34,,4479,1.0746409893035889
|
71 |
+
34,0.9425535798072815,4479,
|
72 |
+
35,,4607,1.0884811878204346
|
73 |
+
35,0.9292816519737244,4607,
|
74 |
+
36,,4735,1.0922189950942993
|
75 |
+
36,0.9182446599006653,4735,
|
76 |
+
37,,4863,1.0835254192352295
|
77 |
+
37,0.9015668034553528,4863,
|
78 |
+
38,,4991,1.0907585620880127
|
79 |
+
38,0.9176275134086609,4991,
|
80 |
+
39,,5119,1.1213494539260864
|
81 |
+
39,0.9064778089523315,5119,
|
82 |
+
40,,5247,1.0660821199417114
|
83 |
+
40,0.8878889083862305,5247,
|
84 |
+
41,,5375,1.0989940166473389
|
85 |
+
41,0.8802230358123779,5375,
|
86 |
+
42,,5503,1.1001183986663818
|
87 |
+
42,0.8679183721542358,5503,
|
88 |
+
43,,5631,1.120335578918457
|
89 |
+
43,0.8558708429336548,5631,
|
90 |
+
44,,5759,1.1134792566299438
|
91 |
+
44,0.8376833200454712,5759,
|
logs/lightning_logs/version_0/events.out.tfevents.1743097353.cn026.2190138.0
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:06819b7813d597c5db3ccb8be5dc3e376772c9803a5afb9eaeb499e1524f015e
|
3 |
+
size 8094
|
logs/lightning_logs/version_0/hparams.yaml
ADDED
@@ -0,0 +1 @@
|
|
|
|
|
1 |
+
{}
|
metrics.csv
ADDED
@@ -0,0 +1,4 @@
|
|
|
|
|
|
|
|
|
|
|
1 |
+
split,split_filename,config_i,model_class,n_parameters,filename,features,labels,cache,extra_featurizers,use_2d,use_fp,dropout,ensemble_size,learning_rate,n_hidden,n_units,val_filename,epochs,batch_size,RMSE,Pearson r,Spearman rho
|
2 |
+
train,/nemo/lab/johnsone/home/users/johnsoe/data/datasets/thomas-2018-spark-wt/Klebsiella-pneumoniae/scaffold-split-train.csv.gz,61,ChempropModelBox,2690450,/nemo/lab/johnsone/home/users/johnsoe/data/datasets/thomas-2018-spark-wt/Klebsiella-pneumoniae/scaffold-split-train.csv.gz,['smiles'],['pmic'],/nemo/lab/johnsone/home/users/johnsoe/projects/abx-discovery-strategy/models/spark/Klebsiella-pneumoniae/61/cache,,True,True,0.2,10,0.0001,3,16,/nemo/lab/johnsone/home/users/johnsoe/data/datasets/thomas-2018-spark-wt/Klebsiella-pneumoniae/scaffold-split-validation.csv.gz,2000,16,0.4014032185077667,0.879788014533255,0.8235991116907959
|
3 |
+
validation,/nemo/lab/johnsone/home/users/johnsoe/data/datasets/thomas-2018-spark-wt/Klebsiella-pneumoniae/scaffold-split-validation.csv.gz,61,ChempropModelBox,2690450,/nemo/lab/johnsone/home/users/johnsoe/data/datasets/thomas-2018-spark-wt/Klebsiella-pneumoniae/scaffold-split-train.csv.gz,['smiles'],['pmic'],/nemo/lab/johnsone/home/users/johnsoe/projects/abx-discovery-strategy/models/spark/Klebsiella-pneumoniae/61/cache,,True,True,0.2,10,0.0001,3,16,/nemo/lab/johnsone/home/users/johnsoe/data/datasets/thomas-2018-spark-wt/Klebsiella-pneumoniae/scaffold-split-validation.csv.gz,2000,16,0.7095186710357666,0.7805225413538466,0.6299348550927065
|
4 |
+
test,/nemo/lab/johnsone/home/users/johnsoe/data/datasets/thomas-2018-spark-wt/Klebsiella-pneumoniae/scaffold-split-test.csv.gz,61,ChempropModelBox,2690450,/nemo/lab/johnsone/home/users/johnsoe/data/datasets/thomas-2018-spark-wt/Klebsiella-pneumoniae/scaffold-split-train.csv.gz,['smiles'],['pmic'],/nemo/lab/johnsone/home/users/johnsoe/projects/abx-discovery-strategy/models/spark/Klebsiella-pneumoniae/61/cache,,True,True,0.2,10,0.0001,3,16,/nemo/lab/johnsone/home/users/johnsoe/data/datasets/thomas-2018-spark-wt/Klebsiella-pneumoniae/scaffold-split-validation.csv.gz,2000,16,0.6779211163520813,0.4050551318825592,0.4843227707887753
|
modelbox-config.json
ADDED
@@ -0,0 +1,11 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"dropout": 0.2,
|
3 |
+
"ensemble_size": 10,
|
4 |
+
"extra_featurizers": null,
|
5 |
+
"learning_rate": 0.0001,
|
6 |
+
"model_class": "ChempropModelBox",
|
7 |
+
"n_hidden": 3,
|
8 |
+
"n_units": 16,
|
9 |
+
"use_2d": true,
|
10 |
+
"use_fp": true
|
11 |
+
}
|
params.pt
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:da1801d43ebc3f74db8d31190366ad537b2b39238c6574eb1b110077ce9f385c
|
3 |
+
size 10875372
|
predictions_test.csv.gz
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:5584a18b5d753d00747d2809e5bb0424cec579aaad0ce24f00e4658bbf7a170c
|
3 |
+
size 705987
|
predictions_train.csv.gz
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:20695113483d5c2d973c00badeaf66c46e3ef2c9cf334b0af09ace0cacd31fc0
|
3 |
+
size 1940344
|
predictions_validation.csv.gz
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:c0e71ba4083d1dc0c1f0a99d43add1eaf03d0d5c1cab18aa58643d962df4d192
|
3 |
+
size 997657
|
training-args.json
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"batch_size": 16,
|
3 |
+
"epochs": 2000,
|
4 |
+
"val_filename": "/nemo/lab/johnsone/home/users/johnsoe/data/datasets/thomas-2018-spark-wt/Klebsiella-pneumoniae/scaffold-split-validation.csv.gz"
|
5 |
+
}
|
training-data.hf/cache-d4aeece68b087032.arrow
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:3873a4aaf25229484a767fcf91a5c465713b58a1255b3b692925ef7d106faf0d
|
3 |
+
size 61560848
|
training-data.hf/data-00000-of-00001.arrow
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:74b27e7fb54789605b5ce5f5646274068f805585004de84caf37383446fe697a
|
3 |
+
size 61282632
|
training-data.hf/dataset_info.json
ADDED
@@ -0,0 +1,126 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"builder_name": "csv",
|
3 |
+
"citation": "",
|
4 |
+
"config_name": "default",
|
5 |
+
"dataset_name": "csv",
|
6 |
+
"dataset_size": 766413,
|
7 |
+
"description": "",
|
8 |
+
"download_checksums": {
|
9 |
+
"/nemo/lab/johnsone/home/users/johnsoe/data/datasets/thomas-2018-spark-wt/Klebsiella-pneumoniae/scaffold-split-train.csv.gz": {
|
10 |
+
"num_bytes": 130202,
|
11 |
+
"checksum": null
|
12 |
+
}
|
13 |
+
},
|
14 |
+
"download_size": 130202,
|
15 |
+
"features": {
|
16 |
+
"smiles": {
|
17 |
+
"feature": {
|
18 |
+
"dtype": "string",
|
19 |
+
"_type": "Value"
|
20 |
+
},
|
21 |
+
"_type": "Sequence"
|
22 |
+
},
|
23 |
+
"inputs": {
|
24 |
+
"V_d": {
|
25 |
+
"dtype": "null",
|
26 |
+
"_type": "Value"
|
27 |
+
},
|
28 |
+
"gt_mask": {
|
29 |
+
"dtype": "null",
|
30 |
+
"_type": "Value"
|
31 |
+
},
|
32 |
+
"lt_mask": {
|
33 |
+
"dtype": "null",
|
34 |
+
"_type": "Value"
|
35 |
+
},
|
36 |
+
"mg": {
|
37 |
+
"E": {
|
38 |
+
"feature": {
|
39 |
+
"feature": {
|
40 |
+
"dtype": "float32",
|
41 |
+
"_type": "Value"
|
42 |
+
},
|
43 |
+
"_type": "Sequence"
|
44 |
+
},
|
45 |
+
"_type": "Sequence"
|
46 |
+
},
|
47 |
+
"V": {
|
48 |
+
"feature": {
|
49 |
+
"feature": {
|
50 |
+
"dtype": "float32",
|
51 |
+
"_type": "Value"
|
52 |
+
},
|
53 |
+
"_type": "Sequence"
|
54 |
+
},
|
55 |
+
"_type": "Sequence"
|
56 |
+
},
|
57 |
+
"edge_index": {
|
58 |
+
"feature": {
|
59 |
+
"feature": {
|
60 |
+
"dtype": "float32",
|
61 |
+
"_type": "Value"
|
62 |
+
},
|
63 |
+
"_type": "Sequence"
|
64 |
+
},
|
65 |
+
"_type": "Sequence"
|
66 |
+
},
|
67 |
+
"rev_edge_index": {
|
68 |
+
"feature": {
|
69 |
+
"dtype": "float32",
|
70 |
+
"_type": "Value"
|
71 |
+
},
|
72 |
+
"_type": "Sequence"
|
73 |
+
}
|
74 |
+
},
|
75 |
+
"weight": {
|
76 |
+
"dtype": "float32",
|
77 |
+
"_type": "Value"
|
78 |
+
},
|
79 |
+
"x_d": {
|
80 |
+
"feature": {
|
81 |
+
"dtype": "float32",
|
82 |
+
"_type": "Value"
|
83 |
+
},
|
84 |
+
"_type": "Sequence"
|
85 |
+
},
|
86 |
+
"y": {
|
87 |
+
"feature": {
|
88 |
+
"dtype": "float32",
|
89 |
+
"_type": "Value"
|
90 |
+
},
|
91 |
+
"_type": "Sequence"
|
92 |
+
}
|
93 |
+
},
|
94 |
+
"labels": {
|
95 |
+
"feature": {
|
96 |
+
"dtype": "float64",
|
97 |
+
"_type": "Value"
|
98 |
+
},
|
99 |
+
"_type": "Sequence"
|
100 |
+
},
|
101 |
+
"extra_features": {
|
102 |
+
"feature": {
|
103 |
+
"dtype": "float32",
|
104 |
+
"_type": "Value"
|
105 |
+
},
|
106 |
+
"_type": "Sequence"
|
107 |
+
}
|
108 |
+
},
|
109 |
+
"homepage": "",
|
110 |
+
"license": "",
|
111 |
+
"size_in_bytes": 896615,
|
112 |
+
"splits": {
|
113 |
+
"train": {
|
114 |
+
"name": "train",
|
115 |
+
"num_bytes": 766413,
|
116 |
+
"num_examples": 2045,
|
117 |
+
"dataset_name": "csv"
|
118 |
+
}
|
119 |
+
},
|
120 |
+
"version": {
|
121 |
+
"version_str": "0.0.0",
|
122 |
+
"major": 0,
|
123 |
+
"minor": 0,
|
124 |
+
"patch": 0
|
125 |
+
}
|
126 |
+
}
|
training-data.hf/state.json
ADDED
@@ -0,0 +1,15 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"_data_files": [
|
3 |
+
{
|
4 |
+
"filename": "data-00000-of-00001.arrow"
|
5 |
+
}
|
6 |
+
],
|
7 |
+
"_fingerprint": "8e7bb11fd5ae4b41",
|
8 |
+
"_format_columns": null,
|
9 |
+
"_format_kwargs": {
|
10 |
+
"dtype": "float"
|
11 |
+
},
|
12 |
+
"_format_type": "numpy",
|
13 |
+
"_output_all_columns": false,
|
14 |
+
"_split": "train"
|
15 |
+
}
|
training-log.csv
ADDED
@@ -0,0 +1,46 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
epoch,step,loss,val_loss
|
2 |
+
0,127,11.470685958862305,3.161213159561157
|
3 |
+
1,255,4.516500473022461,3.421240329742432
|
4 |
+
2,383,4.001670837402344,2.798834085464477
|
5 |
+
3,511,3.4878146648406982,2.4266252517700195
|
6 |
+
4,639,3.153838396072388,2.3812172412872314
|
7 |
+
5,767,2.886950254440308,2.249669075012207
|
8 |
+
6,895,2.6796183586120605,2.1487362384796143
|
9 |
+
7,1023,2.499631881713867,1.9302494525909424
|
10 |
+
8,1151,2.384538173675537,1.895418405532837
|
11 |
+
9,1279,2.242470026016236,1.8401507139205933
|
12 |
+
10,1407,2.1825485229492188,1.7240188121795654
|
13 |
+
11,1535,2.0599634647369385,1.6076897382736206
|
14 |
+
12,1663,1.954922199249268,1.5228654146194458
|
15 |
+
13,1791,1.8475960493087769,1.3301759958267212
|
16 |
+
14,1919,1.7222977876663208,1.197400689125061
|
17 |
+
15,2047,1.662263035774231,1.1432061195373535
|
18 |
+
16,2175,1.5466835498809814,1.1259466409683228
|
19 |
+
17,2303,1.476884126663208,1.133188247680664
|
20 |
+
18,2431,1.4452625513076782,1.1027441024780271
|
21 |
+
19,2559,1.378151774406433,1.158228874206543
|
22 |
+
20,2687,1.318475365638733,1.078474044799805
|
23 |
+
21,2815,1.2622106075286863,1.0715962648391724
|
24 |
+
22,2943,1.2291003465652466,1.075689673423767
|
25 |
+
23,3071,1.1971018314361572,1.0587623119354248
|
26 |
+
24,3199,1.1468651294708252,1.0540975332260132
|
27 |
+
25,3327,1.1313549280166626,1.07569420337677
|
28 |
+
26,3455,1.1012349128723145,1.056328058242798
|
29 |
+
27,3583,1.0781406164169312,1.1024538278579712
|
30 |
+
28,3711,1.058035135269165,1.0793583393096924
|
31 |
+
29,3839,1.0287179946899414,1.0855896472930908
|
32 |
+
30,3967,1.0105702877044678,1.1220470666885376
|
33 |
+
31,4095,0.9984549880027772,1.0875941514968872
|
34 |
+
32,4223,0.987288236618042,1.0672334432601929
|
35 |
+
33,4351,0.9705791473388672,1.0616916418075562
|
36 |
+
34,4479,0.9425535798072816,1.0746409893035889
|
37 |
+
35,4607,0.9292816519737244,1.0884811878204346
|
38 |
+
36,4735,0.9182446599006652,1.092218995094299
|
39 |
+
37,4863,0.9015668034553528,1.0835254192352295
|
40 |
+
38,4991,0.9176275134086608,1.0907585620880127
|
41 |
+
39,5119,0.9064778089523317,1.1213494539260864
|
42 |
+
40,5247,0.8878889083862305,1.0660821199417114
|
43 |
+
41,5375,0.8802230358123779,1.0989940166473389
|
44 |
+
42,5503,0.8679183721542358,1.1001183986663818
|
45 |
+
43,5631,0.8558708429336548,1.120335578918457
|
46 |
+
44,5759,0.8376833200454712,1.1134792566299438
|
training-log.png
ADDED
![]() |