Upload folder using huggingface_hub
Browse files- README.md +140 -0
- data-config.json +17 -0
- data-load-args.json +10 -0
- eval-metrics_test.json +5 -0
- eval-metrics_train.json +5 -0
- eval-metrics_validation.json +5 -0
- input-data.hf/data-00000-of-00001.arrow +3 -0
- input-data.hf/dataset_info.json +52 -0
- input-data.hf/state.json +13 -0
- logs-csv/lightning_logs/version_0/hparams.yaml +1 -0
- logs-csv/lightning_logs/version_0/metrics.csv +83 -0
- logs/lightning_logs/version_0/events.out.tfevents.1743096520.cn042.1346636.0 +3 -0
- logs/lightning_logs/version_0/events.out.tfevents.1743096673.cn090.4018455.0 +3 -0
- logs/lightning_logs/version_0/hparams.yaml +1 -0
- metrics.csv +4 -0
- modelbox-config.json +11 -0
- params.pt +3 -0
- predictions_test.csv.gz +3 -0
- predictions_train.csv.gz +3 -0
- predictions_validation.csv.gz +3 -0
- repo-name.txt +1 -0
- training-args.json +5 -0
- training-data.hf/cache-319d4048b18fe78a.arrow +3 -0
- training-data.hf/cache-cb09ba82b7909a2f.arrow +3 -0
- training-data.hf/data-00000-of-00001.arrow +3 -0
- training-data.hf/dataset_info.json +126 -0
- training-data.hf/state.json +15 -0
- training-log.csv +42 -0
- training-log.png +0 -0
README.md
ADDED
@@ -0,0 +1,140 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: mit
|
3 |
+
pipeline_tag: tabular-regression
|
4 |
+
tags:
|
5 |
+
- chemistry
|
6 |
+
- microbiology
|
7 |
+
- antibiotics
|
8 |
+
library_name: duvida
|
9 |
+
datasets:
|
10 |
+
- scbirlab/thomas-2018-spark-wt
|
11 |
+
---
|
12 |
+
|
13 |
+
# Predictor of _Brucella abortus_ MICs
|
14 |
+
|
15 |
+
_Updated:_ Fri 28 Mar 18:11:12 GMT 2025
|
16 |
+
|
17 |
+
Trained on the _Brucella abortus_, WT accumulator phenotype subset of the [human-curated SPARK dataset](https://doi.org/10.1021/acsinfecdis.8b00193) ( rows in total for _Brucella abortus_).
|
18 |
+
|
19 |
+
## Model details
|
20 |
+
|
21 |
+
This model was trained using [our Duvida framework](https://github.com/scbirlab/duvida),
|
22 |
+
as a result of hyperparameter searches and selecting the model that performs best on unseen test data
|
23 |
+
(from a scaffold split).
|
24 |
+
|
25 |
+
Duvida also saves the training data in this checkpoint to allows the calculation of uncertainty metrics
|
26 |
+
based on that training data.
|
27 |
+
|
28 |
+
This model is the best regression model from a hyperparameter search, determined
|
29 |
+
by Spearman's $\rho$ on a held-out test set not used in training or early stopping.
|
30 |
+
|
31 |
+
### Model architecture
|
32 |
+
|
33 |
+
- **Regression**
|
34 |
+
|
35 |
+
```json
|
36 |
+
|
37 |
+
{
|
38 |
+
"dropout": 0.2,
|
39 |
+
"ensemble_size": 10,
|
40 |
+
"extra_featurizers": null,
|
41 |
+
"learning_rate": 0.0001,
|
42 |
+
"model_class": "ChempropModelBox",
|
43 |
+
"n_hidden": 5,
|
44 |
+
"n_units": 16,
|
45 |
+
"use_2d": true,
|
46 |
+
"use_fp": true
|
47 |
+
}
|
48 |
+
```
|
49 |
+
|
50 |
+
### Model usage
|
51 |
+
|
52 |
+
You can use this model with:
|
53 |
+
|
54 |
+
```python
|
55 |
+
from duvida.autoclasses import AutoModelBox
|
56 |
+
modelbox = AutoModelBox.from_pretrained("hf://scbirlab/spark-dv-2503-babo")
|
57 |
+
modelbox.predict(filename=..., inputs=[...], columns=[...]) # make predictions on your own data
|
58 |
+
```
|
59 |
+
|
60 |
+
## Training details
|
61 |
+
|
62 |
+
- **Dataset:** [SPARK, WT accumulator, _Brucella abortus_ subset](https://huggingface.co/datasets/scbirlab/thomas-2018-spark-wt)
|
63 |
+
- **Input column:** smiles
|
64 |
+
- **Output column:** pmic
|
65 |
+
- **Split type:** Murcko scaffold
|
66 |
+
- **Split proportions:**
|
67 |
+
- 70% training (6963 rows)
|
68 |
+
- 15% validation (for early stopping) (1491 rows)
|
69 |
+
- 15% test (for selecting hyperparameters) (1492 rows)
|
70 |
+
|
71 |
+
Here is the training log:
|
72 |
+
|
73 |
+
<img src="training-log.png" width=450>
|
74 |
+
|
75 |
+
And these are the evaluation scores.
|
76 |
+
|
77 |
+
Train (6963 rows):
|
78 |
+
|
79 |
+
```json
|
80 |
+
|
81 |
+
{
|
82 |
+
"Pearson r": 0.5503574788961993,
|
83 |
+
"RMSE": 0.053899530321359634,
|
84 |
+
"Spearman rho": 0.9346845985202912
|
85 |
+
}
|
86 |
+
```
|
87 |
+
|
88 |
+
Validation (1491 rows):
|
89 |
+
|
90 |
+
```json
|
91 |
+
|
92 |
+
{
|
93 |
+
"Pearson r": 0.4224614855497885,
|
94 |
+
"RMSE": 0.06130664795637131,
|
95 |
+
"Spearman rho": 0.88412899537977
|
96 |
+
}
|
97 |
+
```
|
98 |
+
|
99 |
+
|
100 |
+
Test (1492 rows):
|
101 |
+
|
102 |
+
```json
|
103 |
+
|
104 |
+
{
|
105 |
+
"Pearson r": 0.4829411208985374,
|
106 |
+
"RMSE": 0.06526102125644684,
|
107 |
+
"Spearman rho": 0.9303547327209174
|
108 |
+
}
|
109 |
+
```
|
110 |
+
|
111 |
+
## Training data details
|
112 |
+
|
113 |
+
The training data were collated by the authors of:
|
114 |
+
|
115 |
+
> Joe Thomas, Marc Navre, Aileen Rubio, and Allan Coukell
|
116 |
+
> Shared Platform for Antibiotic Research and Knowledge: A Collaborative Tool to SPARK Antibiotic Discovery
|
117 |
+
> ACS Infectious Diseases 2018 4 (11), 1536-1539
|
118 |
+
> DOI: 10.1021/acsinfecdis.8b00193
|
119 |
+
|
120 |
+
We cleaned the original SPARK dataset to subset the most relevant columns, remove empty values,
|
121 |
+
give succint column titles, and split by species.
|
122 |
+
|
123 |
+
This particular dataset retains only measurements on bacteria with wild-type accumulation phenotypes.
|
124 |
+
|
125 |
+
### Dataset Sources
|
126 |
+
|
127 |
+
- **Repository:** https://www.collaborativedrug.com/spark-data-downloads
|
128 |
+
- **Paper:** https://doi.org/10.1021/acsinfecdis.8b00193
|
129 |
+
|
130 |
+
### Data Collection and Processing
|
131 |
+
|
132 |
+
Data were processed using [schemist](https://github.com/scbirlab/schemist), a tool for processing chemical datasets.
|
133 |
+
|
134 |
+
The SMILES strings have been canonicalized, and split into training (70%), validation (15%), and test (15%) sets
|
135 |
+
by Murcko scaffold for each species with more than 1000 entries. Additional features like molecular weight and
|
136 |
+
topological polar surface area have also been calculated.
|
137 |
+
|
138 |
+
### Who are the source data producers?
|
139 |
+
|
140 |
+
Joe Thomas, Marc Navre, Aileen Rubio, and Allan Coukell
|
data-config.json
ADDED
@@ -0,0 +1,17 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"_default_cache": "cache/duvida/data",
|
3 |
+
"_in_key": "inputs",
|
4 |
+
"_input_cols": [
|
5 |
+
"smiles"
|
6 |
+
],
|
7 |
+
"_label_cols": [
|
8 |
+
"pmic"
|
9 |
+
],
|
10 |
+
"_out_key": "labels",
|
11 |
+
"input_shape": [
|
12 |
+
2248
|
13 |
+
],
|
14 |
+
"output_shape": [
|
15 |
+
1
|
16 |
+
]
|
17 |
+
}
|
data-load-args.json
ADDED
@@ -0,0 +1,10 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"cache": "/nemo/lab/johnsone/home/users/johnsoe/projects/abx-discovery-strategy/models/spark/Brucella-abortus/71/cache",
|
3 |
+
"features": [
|
4 |
+
"smiles"
|
5 |
+
],
|
6 |
+
"filename": "/nemo/lab/johnsone/home/users/johnsoe/data/datasets/thomas-2018-spark-wt/Brucella-abortus/scaffold-split-train.csv.gz",
|
7 |
+
"labels": [
|
8 |
+
"pmic"
|
9 |
+
]
|
10 |
+
}
|
eval-metrics_test.json
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"Pearson r": 0.4829411208985374,
|
3 |
+
"RMSE": 0.06526102125644684,
|
4 |
+
"Spearman rho": 0.9303547327209174
|
5 |
+
}
|
eval-metrics_train.json
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"Pearson r": 0.5503574788961993,
|
3 |
+
"RMSE": 0.053899530321359634,
|
4 |
+
"Spearman rho": 0.9346845985202912
|
5 |
+
}
|
eval-metrics_validation.json
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"Pearson r": 0.4224614855497885,
|
3 |
+
"RMSE": 0.06130664795637131,
|
4 |
+
"Spearman rho": 0.88412899537977
|
5 |
+
}
|
input-data.hf/data-00000-of-00001.arrow
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:2b2162606a61d1e5e86ade4fc3596b9999f29262a4e77274d0df178b4a116944
|
3 |
+
size 709264
|
input-data.hf/dataset_info.json
ADDED
@@ -0,0 +1,52 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"builder_name": "csv",
|
3 |
+
"citation": "",
|
4 |
+
"config_name": "default",
|
5 |
+
"dataset_name": "csv",
|
6 |
+
"dataset_size": 2728879,
|
7 |
+
"description": "",
|
8 |
+
"download_checksums": {
|
9 |
+
"/nemo/lab/johnsone/home/users/johnsoe/data/datasets/thomas-2018-spark-wt/Brucella-abortus/scaffold-split-train.csv.gz": {
|
10 |
+
"num_bytes": 557523,
|
11 |
+
"checksum": null
|
12 |
+
}
|
13 |
+
},
|
14 |
+
"download_size": 557523,
|
15 |
+
"features": {
|
16 |
+
"smiles": {
|
17 |
+
"dtype": "string",
|
18 |
+
"_type": "Value"
|
19 |
+
},
|
20 |
+
"inputs": {
|
21 |
+
"feature": {
|
22 |
+
"dtype": "string",
|
23 |
+
"_type": "Value"
|
24 |
+
},
|
25 |
+
"_type": "Sequence"
|
26 |
+
},
|
27 |
+
"labels": {
|
28 |
+
"feature": {
|
29 |
+
"dtype": "float64",
|
30 |
+
"_type": "Value"
|
31 |
+
},
|
32 |
+
"_type": "Sequence"
|
33 |
+
}
|
34 |
+
},
|
35 |
+
"homepage": "",
|
36 |
+
"license": "",
|
37 |
+
"size_in_bytes": 3286402,
|
38 |
+
"splits": {
|
39 |
+
"train": {
|
40 |
+
"name": "train",
|
41 |
+
"num_bytes": 2728879,
|
42 |
+
"num_examples": 6963,
|
43 |
+
"dataset_name": "csv"
|
44 |
+
}
|
45 |
+
},
|
46 |
+
"version": {
|
47 |
+
"version_str": "0.0.0",
|
48 |
+
"major": 0,
|
49 |
+
"minor": 0,
|
50 |
+
"patch": 0
|
51 |
+
}
|
52 |
+
}
|
input-data.hf/state.json
ADDED
@@ -0,0 +1,13 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"_data_files": [
|
3 |
+
{
|
4 |
+
"filename": "data-00000-of-00001.arrow"
|
5 |
+
}
|
6 |
+
],
|
7 |
+
"_fingerprint": "e067f72bd4fd2f97",
|
8 |
+
"_format_columns": null,
|
9 |
+
"_format_kwargs": {},
|
10 |
+
"_format_type": null,
|
11 |
+
"_output_all_columns": false,
|
12 |
+
"_split": "train"
|
13 |
+
}
|
logs-csv/lightning_logs/version_0/hparams.yaml
ADDED
@@ -0,0 +1 @@
|
|
|
|
|
1 |
+
{}
|
logs-csv/lightning_logs/version_0/metrics.csv
ADDED
@@ -0,0 +1,83 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
epoch,loss,step,val_loss
|
2 |
+
0,,435,0.4943719804286957
|
3 |
+
0,5.289660930633545,435,
|
4 |
+
1,,871,0.19863000512123108
|
5 |
+
1,1.7327244281768799,871,
|
6 |
+
2,,1307,0.07188103348016739
|
7 |
+
2,1.091246247291565,1307,
|
8 |
+
3,,1743,0.04106524959206581
|
9 |
+
3,0.7714532017707825,1743,
|
10 |
+
4,,2179,0.033127326518297195
|
11 |
+
4,0.6212546229362488,2179,
|
12 |
+
5,,2615,0.029247038066387177
|
13 |
+
5,0.5265188813209534,2615,
|
14 |
+
6,,3051,0.02840065397322178
|
15 |
+
6,0.4711878001689911,3051,
|
16 |
+
7,,3487,0.02671700343489647
|
17 |
+
7,0.43018320202827454,3487,
|
18 |
+
8,,3923,0.02705136500298977
|
19 |
+
8,0.3966166377067566,3923,
|
20 |
+
9,,4359,0.026753852143883705
|
21 |
+
9,0.36803075671195984,4359,
|
22 |
+
10,,4795,0.026545658707618713
|
23 |
+
10,0.34880682826042175,4795,
|
24 |
+
11,,5231,0.02747528813779354
|
25 |
+
11,0.33344268798828125,5231,
|
26 |
+
12,,5667,0.026356253772974014
|
27 |
+
12,0.31779828667640686,5667,
|
28 |
+
13,,6103,0.026483017951250076
|
29 |
+
13,0.30868658423423767,6103,
|
30 |
+
14,,6539,0.026650499552488327
|
31 |
+
14,0.2972111701965332,6539,
|
32 |
+
15,,6975,0.02774786204099655
|
33 |
+
15,0.289717435836792,6975,
|
34 |
+
16,,7411,0.026358287781476974
|
35 |
+
16,0.2818793058395386,7411,
|
36 |
+
17,,7847,0.026366928592324257
|
37 |
+
17,0.27312374114990234,7847,
|
38 |
+
18,,8283,0.02649846486747265
|
39 |
+
18,0.2667596936225891,8283,
|
40 |
+
19,,8719,0.02613169141113758
|
41 |
+
19,0.261917382478714,8719,
|
42 |
+
20,,9155,0.025241071358323097
|
43 |
+
20,0.256716251373291,9155,
|
44 |
+
21,,9591,0.02763586863875389
|
45 |
+
21,0.2512047290802002,9591,
|
46 |
+
22,,10027,0.0261844489723444
|
47 |
+
22,0.24682091176509857,10027,
|
48 |
+
23,,10463,0.026615232229232788
|
49 |
+
23,0.2391820251941681,10463,
|
50 |
+
24,,10899,0.02532847970724106
|
51 |
+
24,0.23734702169895172,10899,
|
52 |
+
25,,11335,0.027400750666856766
|
53 |
+
25,0.23324531316757202,11335,
|
54 |
+
26,,11771,0.029379431158304214
|
55 |
+
26,0.2280033677816391,11771,
|
56 |
+
27,,12207,0.025781521573662758
|
57 |
+
27,0.22546932101249695,12207,
|
58 |
+
28,,12643,0.02702564373612404
|
59 |
+
28,0.2223566323518753,12643,
|
60 |
+
29,,13079,0.027757002040743828
|
61 |
+
29,0.21767555177211761,13079,
|
62 |
+
30,,13515,0.02584170363843441
|
63 |
+
30,0.21601980924606323,13515,
|
64 |
+
31,,13951,0.02662491984665394
|
65 |
+
31,0.2132291942834854,13951,
|
66 |
+
32,,14387,0.025853078812360764
|
67 |
+
32,0.20955024659633636,14387,
|
68 |
+
33,,14823,0.02659512124955654
|
69 |
+
33,0.20862342417240143,14823,
|
70 |
+
34,,15259,0.02577393874526024
|
71 |
+
34,0.20273007452487946,15259,
|
72 |
+
35,,15695,0.025896569713950157
|
73 |
+
35,0.2000226080417633,15695,
|
74 |
+
36,,16131,0.025927668437361717
|
75 |
+
36,0.19752368330955505,16131,
|
76 |
+
37,,16567,0.025407912209630013
|
77 |
+
37,0.19396986067295074,16567,
|
78 |
+
38,,17003,0.025420965626835823
|
79 |
+
38,0.19176311790943146,17003,
|
80 |
+
39,,17439,0.026712685823440552
|
81 |
+
39,0.18767289817333221,17439,
|
82 |
+
40,,17875,0.02653094381093979
|
83 |
+
40,0.18428951501846313,17875,
|
logs/lightning_logs/version_0/events.out.tfevents.1743096520.cn042.1346636.0
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:6f81996990bc04dc3d008961fe95941dbd7a212e6da76a42e689364428f60a40
|
3 |
+
size 14156
|
logs/lightning_logs/version_0/events.out.tfevents.1743096673.cn090.4018455.0
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:1a6753e9d0e4d8bfeba7e4e2ef97f30dd1edaa280ed52b40081c9b701c1f1c85
|
3 |
+
size 7402
|
logs/lightning_logs/version_0/hparams.yaml
ADDED
@@ -0,0 +1 @@
|
|
|
|
|
1 |
+
{}
|
metrics.csv
ADDED
@@ -0,0 +1,4 @@
|
|
|
|
|
|
|
|
|
|
|
1 |
+
split,split_filename,config_i,model_class,n_parameters,filename,features,labels,cache,extra_featurizers,use_2d,use_fp,dropout,ensemble_size,learning_rate,n_hidden,n_units,val_filename,epochs,batch_size,RMSE,Pearson r,Spearman rho
|
2 |
+
train,/nemo/lab/johnsone/home/users/johnsoe/data/datasets/thomas-2018-spark-wt/Brucella-abortus/scaffold-split-train.csv.gz,71,ChempropModelBox,2695890,/nemo/lab/johnsone/home/users/johnsoe/data/datasets/thomas-2018-spark-wt/Brucella-abortus/scaffold-split-train.csv.gz,['smiles'],['pmic'],/nemo/lab/johnsone/home/users/johnsoe/projects/abx-discovery-strategy/models/spark/Brucella-abortus/71/cache,,True,True,0.2,10,0.0001,5,16,/nemo/lab/johnsone/home/users/johnsoe/data/datasets/thomas-2018-spark-wt/Brucella-abortus/scaffold-split-validation.csv.gz,2000,16,0.053899530321359634,0.5503574788961993,0.9346845985202912
|
3 |
+
validation,/nemo/lab/johnsone/home/users/johnsoe/data/datasets/thomas-2018-spark-wt/Brucella-abortus/scaffold-split-validation.csv.gz,71,ChempropModelBox,2695890,/nemo/lab/johnsone/home/users/johnsoe/data/datasets/thomas-2018-spark-wt/Brucella-abortus/scaffold-split-train.csv.gz,['smiles'],['pmic'],/nemo/lab/johnsone/home/users/johnsoe/projects/abx-discovery-strategy/models/spark/Brucella-abortus/71/cache,,True,True,0.2,10,0.0001,5,16,/nemo/lab/johnsone/home/users/johnsoe/data/datasets/thomas-2018-spark-wt/Brucella-abortus/scaffold-split-validation.csv.gz,2000,16,0.06130664795637131,0.4224614855497885,0.88412899537977
|
4 |
+
test,/nemo/lab/johnsone/home/users/johnsoe/data/datasets/thomas-2018-spark-wt/Brucella-abortus/scaffold-split-test.csv.gz,71,ChempropModelBox,2695890,/nemo/lab/johnsone/home/users/johnsoe/data/datasets/thomas-2018-spark-wt/Brucella-abortus/scaffold-split-train.csv.gz,['smiles'],['pmic'],/nemo/lab/johnsone/home/users/johnsoe/projects/abx-discovery-strategy/models/spark/Brucella-abortus/71/cache,,True,True,0.2,10,0.0001,5,16,/nemo/lab/johnsone/home/users/johnsoe/data/datasets/thomas-2018-spark-wt/Brucella-abortus/scaffold-split-validation.csv.gz,2000,16,0.06526102125644684,0.4829411208985374,0.9303547327209174
|
modelbox-config.json
ADDED
@@ -0,0 +1,11 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"dropout": 0.2,
|
3 |
+
"ensemble_size": 10,
|
4 |
+
"extra_featurizers": null,
|
5 |
+
"learning_rate": 0.0001,
|
6 |
+
"model_class": "ChempropModelBox",
|
7 |
+
"n_hidden": 5,
|
8 |
+
"n_units": 16,
|
9 |
+
"use_2d": true,
|
10 |
+
"use_fp": true
|
11 |
+
}
|
params.pt
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:7ab362f69333085d64960bc0b48bd9fb29fe987884a6ccef59a7c72d6e0dd550
|
3 |
+
size 10927220
|
predictions_test.csv.gz
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:d026f37e4b55e96a28897ae16c3a7ed4f4579e8c51866a310bfffee0766938fc
|
3 |
+
size 2056858
|
predictions_train.csv.gz
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:648df99aa91bdbba32e734700c6f7242cfad451a9a3320b2778c489a1ef80597
|
3 |
+
size 8503320
|
predictions_validation.csv.gz
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:7cb855c204b06eddd16a83ef94589f0e4dd402680b5a5af22c847e9901ba93c9
|
3 |
+
size 1969416
|
repo-name.txt
ADDED
@@ -0,0 +1 @@
|
|
|
|
|
1 |
+
scbirlab/spark-dv-2503-babo
|
training-args.json
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"batch_size": 16,
|
3 |
+
"epochs": 2000,
|
4 |
+
"val_filename": "/nemo/lab/johnsone/home/users/johnsoe/data/datasets/thomas-2018-spark-wt/Brucella-abortus/scaffold-split-validation.csv.gz"
|
5 |
+
}
|
training-data.hf/cache-319d4048b18fe78a.arrow
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:73280046b294b8449b20dd5566352e92cb89e8873df20791ce445e4575a71fa3
|
3 |
+
size 195669256
|
training-data.hf/cache-cb09ba82b7909a2f.arrow
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:c864bf6b6d49c7429f49954a87409379a8c493a963a82a65c661165714f15ac1
|
3 |
+
size 126155960
|
training-data.hf/data-00000-of-00001.arrow
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:4b51eda13ecc9862377bf422df07279fa7312eda97175de202f86c8be05f5bd8
|
3 |
+
size 194716136
|
training-data.hf/dataset_info.json
ADDED
@@ -0,0 +1,126 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"builder_name": "csv",
|
3 |
+
"citation": "",
|
4 |
+
"config_name": "default",
|
5 |
+
"dataset_name": "csv",
|
6 |
+
"dataset_size": 2728879,
|
7 |
+
"description": "",
|
8 |
+
"download_checksums": {
|
9 |
+
"/nemo/lab/johnsone/home/users/johnsoe/data/datasets/thomas-2018-spark-wt/Brucella-abortus/scaffold-split-train.csv.gz": {
|
10 |
+
"num_bytes": 557523,
|
11 |
+
"checksum": null
|
12 |
+
}
|
13 |
+
},
|
14 |
+
"download_size": 557523,
|
15 |
+
"features": {
|
16 |
+
"smiles": {
|
17 |
+
"feature": {
|
18 |
+
"dtype": "string",
|
19 |
+
"_type": "Value"
|
20 |
+
},
|
21 |
+
"_type": "Sequence"
|
22 |
+
},
|
23 |
+
"inputs": {
|
24 |
+
"V_d": {
|
25 |
+
"dtype": "null",
|
26 |
+
"_type": "Value"
|
27 |
+
},
|
28 |
+
"gt_mask": {
|
29 |
+
"dtype": "null",
|
30 |
+
"_type": "Value"
|
31 |
+
},
|
32 |
+
"lt_mask": {
|
33 |
+
"dtype": "null",
|
34 |
+
"_type": "Value"
|
35 |
+
},
|
36 |
+
"mg": {
|
37 |
+
"E": {
|
38 |
+
"feature": {
|
39 |
+
"feature": {
|
40 |
+
"dtype": "float32",
|
41 |
+
"_type": "Value"
|
42 |
+
},
|
43 |
+
"_type": "Sequence"
|
44 |
+
},
|
45 |
+
"_type": "Sequence"
|
46 |
+
},
|
47 |
+
"V": {
|
48 |
+
"feature": {
|
49 |
+
"feature": {
|
50 |
+
"dtype": "float32",
|
51 |
+
"_type": "Value"
|
52 |
+
},
|
53 |
+
"_type": "Sequence"
|
54 |
+
},
|
55 |
+
"_type": "Sequence"
|
56 |
+
},
|
57 |
+
"edge_index": {
|
58 |
+
"feature": {
|
59 |
+
"feature": {
|
60 |
+
"dtype": "float32",
|
61 |
+
"_type": "Value"
|
62 |
+
},
|
63 |
+
"_type": "Sequence"
|
64 |
+
},
|
65 |
+
"_type": "Sequence"
|
66 |
+
},
|
67 |
+
"rev_edge_index": {
|
68 |
+
"feature": {
|
69 |
+
"dtype": "float32",
|
70 |
+
"_type": "Value"
|
71 |
+
},
|
72 |
+
"_type": "Sequence"
|
73 |
+
}
|
74 |
+
},
|
75 |
+
"weight": {
|
76 |
+
"dtype": "float32",
|
77 |
+
"_type": "Value"
|
78 |
+
},
|
79 |
+
"x_d": {
|
80 |
+
"feature": {
|
81 |
+
"dtype": "float32",
|
82 |
+
"_type": "Value"
|
83 |
+
},
|
84 |
+
"_type": "Sequence"
|
85 |
+
},
|
86 |
+
"y": {
|
87 |
+
"feature": {
|
88 |
+
"dtype": "float32",
|
89 |
+
"_type": "Value"
|
90 |
+
},
|
91 |
+
"_type": "Sequence"
|
92 |
+
}
|
93 |
+
},
|
94 |
+
"labels": {
|
95 |
+
"feature": {
|
96 |
+
"dtype": "float64",
|
97 |
+
"_type": "Value"
|
98 |
+
},
|
99 |
+
"_type": "Sequence"
|
100 |
+
},
|
101 |
+
"extra_features": {
|
102 |
+
"feature": {
|
103 |
+
"dtype": "float32",
|
104 |
+
"_type": "Value"
|
105 |
+
},
|
106 |
+
"_type": "Sequence"
|
107 |
+
}
|
108 |
+
},
|
109 |
+
"homepage": "",
|
110 |
+
"license": "",
|
111 |
+
"size_in_bytes": 3286402,
|
112 |
+
"splits": {
|
113 |
+
"train": {
|
114 |
+
"name": "train",
|
115 |
+
"num_bytes": 2728879,
|
116 |
+
"num_examples": 6963,
|
117 |
+
"dataset_name": "csv"
|
118 |
+
}
|
119 |
+
},
|
120 |
+
"version": {
|
121 |
+
"version_str": "0.0.0",
|
122 |
+
"major": 0,
|
123 |
+
"minor": 0,
|
124 |
+
"patch": 0
|
125 |
+
}
|
126 |
+
}
|
training-data.hf/state.json
ADDED
@@ -0,0 +1,15 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"_data_files": [
|
3 |
+
{
|
4 |
+
"filename": "data-00000-of-00001.arrow"
|
5 |
+
}
|
6 |
+
],
|
7 |
+
"_fingerprint": "6ed781d8d198507b",
|
8 |
+
"_format_columns": null,
|
9 |
+
"_format_kwargs": {
|
10 |
+
"dtype": "float"
|
11 |
+
},
|
12 |
+
"_format_type": "numpy",
|
13 |
+
"_output_all_columns": false,
|
14 |
+
"_split": "train"
|
15 |
+
}
|
training-log.csv
ADDED
@@ -0,0 +1,42 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
epoch,step,loss,val_loss
|
2 |
+
0,435,5.289660930633545,0.4943719804286957
|
3 |
+
1,871,1.73272442817688,0.198630005121231
|
4 |
+
2,1307,1.091246247291565,0.0718810334801673
|
5 |
+
3,1743,0.7714532017707825,0.0410652495920658
|
6 |
+
4,2179,0.6212546229362488,0.0331273265182971
|
7 |
+
5,2615,0.5265188813209534,0.0292470380663871
|
8 |
+
6,3051,0.4711878001689911,0.0284006539732217
|
9 |
+
7,3487,0.4301832020282745,0.0267170034348964
|
10 |
+
8,3923,0.3966166377067566,0.0270513650029897
|
11 |
+
9,4359,0.3680307567119598,0.0267538521438837
|
12 |
+
10,4795,0.3488068282604217,0.0265456587076187
|
13 |
+
11,5231,0.3334426879882812,0.0274752881377935
|
14 |
+
12,5667,0.3177982866764068,0.026356253772974
|
15 |
+
13,6103,0.3086865842342376,0.02648301795125
|
16 |
+
14,6539,0.2972111701965332,0.0266504995524883
|
17 |
+
15,6975,0.289717435836792,0.0277478620409965
|
18 |
+
16,7411,0.2818793058395386,0.0263582877814769
|
19 |
+
17,7847,0.2731237411499023,0.0263669285923242
|
20 |
+
18,8283,0.2667596936225891,0.0264984648674726
|
21 |
+
19,8719,0.261917382478714,0.0261316914111375
|
22 |
+
20,9155,0.256716251373291,0.025241071358323
|
23 |
+
21,9591,0.2512047290802002,0.0276358686387538
|
24 |
+
22,10027,0.2468209117650985,0.0261844489723444
|
25 |
+
23,10463,0.2391820251941681,0.0266152322292327
|
26 |
+
24,10899,0.2373470216989517,0.025328479707241
|
27 |
+
25,11335,0.233245313167572,0.0274007506668567
|
28 |
+
26,11771,0.2280033677816391,0.0293794311583042
|
29 |
+
27,12207,0.2254693210124969,0.0257815215736627
|
30 |
+
28,12643,0.2223566323518753,0.027025643736124
|
31 |
+
29,13079,0.2176755517721176,0.0277570020407438
|
32 |
+
30,13515,0.2160198092460632,0.0258417036384344
|
33 |
+
31,13951,0.2132291942834854,0.0266249198466539
|
34 |
+
32,14387,0.2095502465963363,0.0258530788123607
|
35 |
+
33,14823,0.2086234241724014,0.0265951212495565
|
36 |
+
34,15259,0.2027300745248794,0.0257739387452602
|
37 |
+
35,15695,0.2000226080417633,0.0258965697139501
|
38 |
+
36,16131,0.197523683309555,0.0259276684373617
|
39 |
+
37,16567,0.1939698606729507,0.02540791220963
|
40 |
+
38,17003,0.1917631179094314,0.0254209656268358
|
41 |
+
39,17439,0.1876728981733322,0.0267126858234405
|
42 |
+
40,17875,0.1842895150184631,0.0265309438109397
|
training-log.png
ADDED
![]() |