Upload folder using huggingface_hub
Browse files- README.md +140 -0
- data-config.json +17 -0
- data-load-args.json +10 -0
- eval-metrics_test.json +5 -0
- eval-metrics_train.json +5 -0
- eval-metrics_validation.json +5 -0
- input-data.hf/data-00000-of-00001.arrow +3 -0
- input-data.hf/dataset_info.json +52 -0
- input-data.hf/state.json +13 -0
- logs-csv/lightning_logs/version_0/hparams.yaml +1 -0
- logs-csv/lightning_logs/version_0/metrics.csv +107 -0
- logs/lightning_logs/version_0/events.out.tfevents.1743098122.cn039.615154.0 +3 -0
- logs/lightning_logs/version_0/hparams.yaml +1 -0
- metrics.csv +4 -0
- modelbox-config.json +11 -0
- params.pt +3 -0
- predictions_test.csv.gz +3 -0
- predictions_train.csv.gz +3 -0
- predictions_validation.csv.gz +3 -0
- repo-name.txt +1 -0
- training-args.json +5 -0
- training-data.hf/cache-6e45fb8cb7c74a86.arrow +3 -0
- training-data.hf/data-00000-of-00001.arrow +3 -0
- training-data.hf/dataset_info.json +126 -0
- training-data.hf/state.json +15 -0
- training-log.csv +54 -0
- training-log.png +0 -0
README.md
ADDED
@@ -0,0 +1,140 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: mit
|
3 |
+
pipeline_tag: tabular-regression
|
4 |
+
tags:
|
5 |
+
- chemistry
|
6 |
+
- microbiology
|
7 |
+
- antibiotics
|
8 |
+
library_name: duvida
|
9 |
+
datasets:
|
10 |
+
- scbirlab/thomas-2018-spark-wt
|
11 |
+
---
|
12 |
+
|
13 |
+
# Predictor of _Streptococcus pneumoniae_ MICs
|
14 |
+
|
15 |
+
_Updated:_ Fri 28 Mar 18:11:42 GMT 2025
|
16 |
+
|
17 |
+
Trained on the _Streptococcus pneumoniae_, WT accumulator phenotype subset of the [human-curated SPARK dataset](https://doi.org/10.1021/acsinfecdis.8b00193) ( rows in total for _Streptococcus pneumoniae_).
|
18 |
+
|
19 |
+
## Model details
|
20 |
+
|
21 |
+
This model was trained using [our Duvida framework](https://github.com/scbirlab/duvida),
|
22 |
+
as a result of hyperparameter searches and selecting the model that performs best on unseen test data
|
23 |
+
(from a scaffold split).
|
24 |
+
|
25 |
+
Duvida also saves the training data in this checkpoint to allows the calculation of uncertainty metrics
|
26 |
+
based on that training data.
|
27 |
+
|
28 |
+
This model is the best regression model from a hyperparameter search, determined
|
29 |
+
by Spearman's $\rho$ on a held-out test set not used in training or early stopping.
|
30 |
+
|
31 |
+
### Model architecture
|
32 |
+
|
33 |
+
- **Regression**
|
34 |
+
|
35 |
+
```json
|
36 |
+
|
37 |
+
{
|
38 |
+
"dropout": 0.2,
|
39 |
+
"ensemble_size": 10,
|
40 |
+
"extra_featurizers": null,
|
41 |
+
"learning_rate": 0.0001,
|
42 |
+
"model_class": "ChempropModelBox",
|
43 |
+
"n_hidden": 5,
|
44 |
+
"n_units": 256,
|
45 |
+
"use_2d": true,
|
46 |
+
"use_fp": true
|
47 |
+
}
|
48 |
+
```
|
49 |
+
|
50 |
+
### Model usage
|
51 |
+
|
52 |
+
You can use this model with:
|
53 |
+
|
54 |
+
```python
|
55 |
+
from duvida.autoclasses import AutoModelBox
|
56 |
+
modelbox = AutoModelBox.from_pretrained("hf://scbirlab/spark-dv-2503-spne")
|
57 |
+
modelbox.predict(filename=..., inputs=[...], columns=[...]) # make predictions on your own data
|
58 |
+
```
|
59 |
+
|
60 |
+
## Training details
|
61 |
+
|
62 |
+
- **Dataset:** [SPARK, WT accumulator, _Streptococcus pneumoniae_ subset](https://huggingface.co/datasets/scbirlab/thomas-2018-spark-wt)
|
63 |
+
- **Input column:** smiles
|
64 |
+
- **Output column:** pmic
|
65 |
+
- **Split type:** Murcko scaffold
|
66 |
+
- **Split proportions:**
|
67 |
+
- 70% training (6 rows)
|
68 |
+
- 15% validation (for early stopping) (17 rows)
|
69 |
+
- 15% test (for selecting hyperparameters) (5 rows)
|
70 |
+
|
71 |
+
Here is the training log:
|
72 |
+
|
73 |
+
<img src="training-log.png" width=450>
|
74 |
+
|
75 |
+
And these are the evaluation scores.
|
76 |
+
|
77 |
+
Train (6 rows):
|
78 |
+
|
79 |
+
```json
|
80 |
+
|
81 |
+
{
|
82 |
+
"Pearson r": 0.4138370404282536,
|
83 |
+
"RMSE": 0.3910311758518219,
|
84 |
+
"Spearman rho": 0.818181818181818
|
85 |
+
}
|
86 |
+
```
|
87 |
+
|
88 |
+
Validation (17 rows):
|
89 |
+
|
90 |
+
```json
|
91 |
+
|
92 |
+
{
|
93 |
+
"Pearson r": 0.954468386191219,
|
94 |
+
"RMSE": 1.128600001335144,
|
95 |
+
"Spearman rho": 0.8730841616511641
|
96 |
+
}
|
97 |
+
```
|
98 |
+
|
99 |
+
|
100 |
+
Test (5 rows):
|
101 |
+
|
102 |
+
```json
|
103 |
+
|
104 |
+
{
|
105 |
+
"Pearson r": 0.3867792333436302,
|
106 |
+
"RMSE": 0.8993263244628906,
|
107 |
+
"Spearman rho": 0.09999999999999999
|
108 |
+
}
|
109 |
+
```
|
110 |
+
|
111 |
+
## Training data details
|
112 |
+
|
113 |
+
The training data were collated by the authors of:
|
114 |
+
|
115 |
+
> Joe Thomas, Marc Navre, Aileen Rubio, and Allan Coukell
|
116 |
+
> Shared Platform for Antibiotic Research and Knowledge: A Collaborative Tool to SPARK Antibiotic Discovery
|
117 |
+
> ACS Infectious Diseases 2018 4 (11), 1536-1539
|
118 |
+
> DOI: 10.1021/acsinfecdis.8b00193
|
119 |
+
|
120 |
+
We cleaned the original SPARK dataset to subset the most relevant columns, remove empty values,
|
121 |
+
give succint column titles, and split by species.
|
122 |
+
|
123 |
+
This particular dataset retains only measurements on bacteria with wild-type accumulation phenotypes.
|
124 |
+
|
125 |
+
### Dataset Sources
|
126 |
+
|
127 |
+
- **Repository:** https://www.collaborativedrug.com/spark-data-downloads
|
128 |
+
- **Paper:** https://doi.org/10.1021/acsinfecdis.8b00193
|
129 |
+
|
130 |
+
### Data Collection and Processing
|
131 |
+
|
132 |
+
Data were processed using [schemist](https://github.com/scbirlab/schemist), a tool for processing chemical datasets.
|
133 |
+
|
134 |
+
The SMILES strings have been canonicalized, and split into training (70%), validation (15%), and test (15%) sets
|
135 |
+
by Murcko scaffold for each species with more than 1000 entries. Additional features like molecular weight and
|
136 |
+
topological polar surface area have also been calculated.
|
137 |
+
|
138 |
+
### Who are the source data producers?
|
139 |
+
|
140 |
+
Joe Thomas, Marc Navre, Aileen Rubio, and Allan Coukell
|
data-config.json
ADDED
@@ -0,0 +1,17 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"_default_cache": "cache/duvida/data",
|
3 |
+
"_in_key": "inputs",
|
4 |
+
"_input_cols": [
|
5 |
+
"smiles"
|
6 |
+
],
|
7 |
+
"_label_cols": [
|
8 |
+
"pmic"
|
9 |
+
],
|
10 |
+
"_out_key": "labels",
|
11 |
+
"input_shape": [
|
12 |
+
2248
|
13 |
+
],
|
14 |
+
"output_shape": [
|
15 |
+
1
|
16 |
+
]
|
17 |
+
}
|
data-load-args.json
ADDED
@@ -0,0 +1,10 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"cache": "/nemo/lab/johnsone/home/users/johnsoe/projects/abx-discovery-strategy/models/spark/Streptococcus-pneumoniae/79/cache",
|
3 |
+
"features": [
|
4 |
+
"smiles"
|
5 |
+
],
|
6 |
+
"filename": "/nemo/lab/johnsone/home/users/johnsoe/data/datasets/thomas-2018-spark-wt/Streptococcus-pneumoniae/scaffold-split-train.csv.gz",
|
7 |
+
"labels": [
|
8 |
+
"pmic"
|
9 |
+
]
|
10 |
+
}
|
eval-metrics_test.json
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"Pearson r": 0.3867792333436302,
|
3 |
+
"RMSE": 0.8993263244628906,
|
4 |
+
"Spearman rho": 0.09999999999999999
|
5 |
+
}
|
eval-metrics_train.json
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"Pearson r": 0.4138370404282536,
|
3 |
+
"RMSE": 0.3910311758518219,
|
4 |
+
"Spearman rho": 0.818181818181818
|
5 |
+
}
|
eval-metrics_validation.json
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"Pearson r": 0.954468386191219,
|
3 |
+
"RMSE": 1.128600001335144,
|
4 |
+
"Spearman rho": 0.8730841616511641
|
5 |
+
}
|
input-data.hf/data-00000-of-00001.arrow
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:664224d98a51c44eb85b3aa3c2ee0ac092ef6ec90caa9f2d0b6358e75ec94af3
|
3 |
+
size 1944
|
input-data.hf/dataset_info.json
ADDED
@@ -0,0 +1,52 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"builder_name": "csv",
|
3 |
+
"citation": "",
|
4 |
+
"config_name": "default",
|
5 |
+
"dataset_name": "csv",
|
6 |
+
"dataset_size": 2735,
|
7 |
+
"description": "",
|
8 |
+
"download_checksums": {
|
9 |
+
"/nemo/lab/johnsone/home/users/johnsoe/data/datasets/thomas-2018-spark-wt/Streptococcus-pneumoniae/scaffold-split-train.csv.gz": {
|
10 |
+
"num_bytes": 1012,
|
11 |
+
"checksum": null
|
12 |
+
}
|
13 |
+
},
|
14 |
+
"download_size": 1012,
|
15 |
+
"features": {
|
16 |
+
"smiles": {
|
17 |
+
"dtype": "string",
|
18 |
+
"_type": "Value"
|
19 |
+
},
|
20 |
+
"inputs": {
|
21 |
+
"feature": {
|
22 |
+
"dtype": "string",
|
23 |
+
"_type": "Value"
|
24 |
+
},
|
25 |
+
"_type": "Sequence"
|
26 |
+
},
|
27 |
+
"labels": {
|
28 |
+
"feature": {
|
29 |
+
"dtype": "float64",
|
30 |
+
"_type": "Value"
|
31 |
+
},
|
32 |
+
"_type": "Sequence"
|
33 |
+
}
|
34 |
+
},
|
35 |
+
"homepage": "",
|
36 |
+
"license": "",
|
37 |
+
"size_in_bytes": 3747,
|
38 |
+
"splits": {
|
39 |
+
"train": {
|
40 |
+
"name": "train",
|
41 |
+
"num_bytes": 2735,
|
42 |
+
"num_examples": 6,
|
43 |
+
"dataset_name": "csv"
|
44 |
+
}
|
45 |
+
},
|
46 |
+
"version": {
|
47 |
+
"version_str": "0.0.0",
|
48 |
+
"major": 0,
|
49 |
+
"minor": 0,
|
50 |
+
"patch": 0
|
51 |
+
}
|
52 |
+
}
|
input-data.hf/state.json
ADDED
@@ -0,0 +1,13 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"_data_files": [
|
3 |
+
{
|
4 |
+
"filename": "data-00000-of-00001.arrow"
|
5 |
+
}
|
6 |
+
],
|
7 |
+
"_fingerprint": "cc1fa05b3dc69da9",
|
8 |
+
"_format_columns": null,
|
9 |
+
"_format_kwargs": {},
|
10 |
+
"_format_type": null,
|
11 |
+
"_output_all_columns": false,
|
12 |
+
"_split": "train"
|
13 |
+
}
|
logs-csv/lightning_logs/version_0/hparams.yaml
ADDED
@@ -0,0 +1 @@
|
|
|
|
|
1 |
+
{}
|
logs-csv/lightning_logs/version_0/metrics.csv
ADDED
@@ -0,0 +1,107 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
epoch,loss,step,val_loss
|
2 |
+
0,,0,28.506013870239258
|
3 |
+
0,35.035179138183594,0,
|
4 |
+
1,,1,28.133272171020508
|
5 |
+
1,34.59953308105469,1,
|
6 |
+
2,,2,27.75455093383789
|
7 |
+
2,34.28255081176758,2,
|
8 |
+
3,,3,27.360851287841797
|
9 |
+
3,33.872459411621094,3,
|
10 |
+
4,,4,26.94652557373047
|
11 |
+
4,33.49507141113281,4,
|
12 |
+
5,,5,26.503360748291016
|
13 |
+
5,32.932044982910156,5,
|
14 |
+
6,,6,26.024642944335938
|
15 |
+
6,32.40605163574219,6,
|
16 |
+
7,,7,25.50464630126953
|
17 |
+
7,31.995607376098633,7,
|
18 |
+
8,,8,24.93709945678711
|
19 |
+
8,31.35414695739746,8,
|
20 |
+
9,,9,24.317028045654297
|
21 |
+
9,30.852384567260742,9,
|
22 |
+
10,,10,23.639673233032227
|
23 |
+
10,30.293041229248047,10,
|
24 |
+
11,,11,22.898895263671875
|
25 |
+
11,29.3199462890625,11,
|
26 |
+
12,,12,22.09198760986328
|
27 |
+
12,28.761568069458008,12,
|
28 |
+
13,,13,21.21438980102539
|
29 |
+
13,27.734697341918945,13,
|
30 |
+
14,,14,20.263683319091797
|
31 |
+
14,26.935657501220703,14,
|
32 |
+
15,,15,19.238662719726562
|
33 |
+
15,25.863750457763672,15,
|
34 |
+
16,,16,18.138511657714844
|
35 |
+
16,24.692529678344727,16,
|
36 |
+
17,,17,16.965312957763672
|
37 |
+
17,23.58732795715332,17,
|
38 |
+
18,,18,15.723073959350586
|
39 |
+
18,22.249080657958984,18,
|
40 |
+
19,,19,14.418327331542969
|
41 |
+
19,20.945390701293945,19,
|
42 |
+
20,,20,13.059042930603027
|
43 |
+
20,19.49625587463379,20,
|
44 |
+
21,,21,11.65770435333252
|
45 |
+
21,18.035572052001953,21,
|
46 |
+
22,,22,10.229310035705566
|
47 |
+
22,16.04208755493164,22,
|
48 |
+
23,,23,8.793594360351562
|
49 |
+
23,14.609708786010742,23,
|
50 |
+
24,,24,7.373780250549316
|
51 |
+
24,12.870007514953613,24,
|
52 |
+
25,,25,5.998498916625977
|
53 |
+
25,11.573694229125977,25,
|
54 |
+
26,,26,4.699665546417236
|
55 |
+
26,9.815820693969727,26,
|
56 |
+
27,,27,3.513622283935547
|
57 |
+
27,7.948325157165527,27,
|
58 |
+
28,,28,2.480156421661377
|
59 |
+
28,6.660716533660889,28,
|
60 |
+
29,,29,1.6397855281829834
|
61 |
+
29,5.119527339935303,29,
|
62 |
+
30,,30,1.0356911420822144
|
63 |
+
30,3.504617214202881,30,
|
64 |
+
31,,31,0.7050375938415527
|
65 |
+
31,2.7646963596343994,31,
|
66 |
+
32,,32,0.6781078577041626
|
67 |
+
32,1.6973930597305298,32,
|
68 |
+
33,,33,0.9667624831199646
|
69 |
+
33,1.075992226600647,33,
|
70 |
+
34,,34,1.556015133857727
|
71 |
+
34,0.6845546960830688,34,
|
72 |
+
35,,35,2.3837873935699463
|
73 |
+
35,0.3879012167453766,35,
|
74 |
+
36,,36,3.3532373905181885
|
75 |
+
36,0.4765850305557251,36,
|
76 |
+
37,,37,4.322829723358154
|
77 |
+
37,0.8820050954818726,37,
|
78 |
+
38,,38,5.122341632843018
|
79 |
+
38,1.508744478225708,38,
|
80 |
+
39,,39,5.632863998413086
|
81 |
+
39,2.0902042388916016,39,
|
82 |
+
40,,40,5.850592136383057
|
83 |
+
40,1.9949707984924316,40,
|
84 |
+
41,,41,5.758612632751465
|
85 |
+
41,2.3712332248687744,41,
|
86 |
+
42,,42,5.433818817138672
|
87 |
+
42,2.0827255249023438,42,
|
88 |
+
43,,43,4.966310977935791
|
89 |
+
43,1.8089426755905151,43,
|
90 |
+
44,,44,4.401313781738281
|
91 |
+
44,1.557389497756958,44,
|
92 |
+
45,,45,3.8155436515808105
|
93 |
+
45,1.2402079105377197,45,
|
94 |
+
46,,46,3.2567148208618164
|
95 |
+
46,0.9037799835205078,46,
|
96 |
+
47,,47,2.732806444168091
|
97 |
+
47,0.9471979141235352,47,
|
98 |
+
48,,48,2.27205228805542
|
99 |
+
48,0.7469801306724548,48,
|
100 |
+
49,,49,1.8864364624023438
|
101 |
+
49,0.53239905834198,49,
|
102 |
+
50,,50,1.578837513923645
|
103 |
+
50,0.5718014240264893,50,
|
104 |
+
51,,51,1.3402189016342163
|
105 |
+
51,0.43722790479660034,51,
|
106 |
+
52,,52,1.1567370891571045
|
107 |
+
52,0.531734824180603,52,
|
logs/lightning_logs/version_0/events.out.tfevents.1743098122.cn039.615154.0
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:01026284649bd26dad5e642d94360445033c0f20a21b44c3fda6d3ddc3b61108
|
3 |
+
size 9302
|
logs/lightning_logs/version_0/hparams.yaml
ADDED
@@ -0,0 +1 @@
|
|
|
|
|
1 |
+
{}
|
metrics.csv
ADDED
@@ -0,0 +1,4 @@
|
|
|
|
|
|
|
|
|
|
|
1 |
+
split,split_filename,config_i,model_class,n_parameters,filename,features,labels,cache,extra_featurizers,use_2d,use_fp,dropout,ensemble_size,learning_rate,n_hidden,n_units,val_filename,epochs,batch_size,RMSE,Pearson r,Spearman rho
|
2 |
+
train,/nemo/lab/johnsone/home/users/johnsoe/data/datasets/thomas-2018-spark-wt/Streptococcus-pneumoniae/scaffold-split-train.csv.gz,79,ChempropModelBox,11436690,/nemo/lab/johnsone/home/users/johnsoe/data/datasets/thomas-2018-spark-wt/Streptococcus-pneumoniae/scaffold-split-train.csv.gz,['smiles'],['pmic'],/nemo/lab/johnsone/home/users/johnsoe/projects/abx-discovery-strategy/models/spark/Streptococcus-pneumoniae/79/cache,,True,True,0.2,10,0.0001,5,256,/nemo/lab/johnsone/home/users/johnsoe/data/datasets/thomas-2018-spark-wt/Streptococcus-pneumoniae/scaffold-split-validation.csv.gz,2000,16,0.3910311758518219,0.4138370404282536,0.818181818181818
|
3 |
+
validation,/nemo/lab/johnsone/home/users/johnsoe/data/datasets/thomas-2018-spark-wt/Streptococcus-pneumoniae/scaffold-split-validation.csv.gz,79,ChempropModelBox,11436690,/nemo/lab/johnsone/home/users/johnsoe/data/datasets/thomas-2018-spark-wt/Streptococcus-pneumoniae/scaffold-split-train.csv.gz,['smiles'],['pmic'],/nemo/lab/johnsone/home/users/johnsoe/projects/abx-discovery-strategy/models/spark/Streptococcus-pneumoniae/79/cache,,True,True,0.2,10,0.0001,5,256,/nemo/lab/johnsone/home/users/johnsoe/data/datasets/thomas-2018-spark-wt/Streptococcus-pneumoniae/scaffold-split-validation.csv.gz,2000,16,1.128600001335144,0.954468386191219,0.8730841616511641
|
4 |
+
test,/nemo/lab/johnsone/home/users/johnsoe/data/datasets/thomas-2018-spark-wt/Streptococcus-pneumoniae/scaffold-split-test.csv.gz,79,ChempropModelBox,11436690,/nemo/lab/johnsone/home/users/johnsoe/data/datasets/thomas-2018-spark-wt/Streptococcus-pneumoniae/scaffold-split-train.csv.gz,['smiles'],['pmic'],/nemo/lab/johnsone/home/users/johnsoe/projects/abx-discovery-strategy/models/spark/Streptococcus-pneumoniae/79/cache,,True,True,0.2,10,0.0001,5,256,/nemo/lab/johnsone/home/users/johnsoe/data/datasets/thomas-2018-spark-wt/Streptococcus-pneumoniae/scaffold-split-validation.csv.gz,2000,16,0.8993263244628906,0.3867792333436302,0.09999999999999999
|
modelbox-config.json
ADDED
@@ -0,0 +1,11 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"dropout": 0.2,
|
3 |
+
"ensemble_size": 10,
|
4 |
+
"extra_featurizers": null,
|
5 |
+
"learning_rate": 0.0001,
|
6 |
+
"model_class": "ChempropModelBox",
|
7 |
+
"n_hidden": 5,
|
8 |
+
"n_units": 256,
|
9 |
+
"use_2d": true,
|
10 |
+
"use_fp": true
|
11 |
+
}
|
params.pt
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:6ac12ca4b6def3792ab54b181a5f8b19a4c632e58463c984653d1737eb073574
|
3 |
+
size 45891124
|
predictions_test.csv.gz
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:3aed6f1c8276839685b16d13f7d6f571fcd17fb157ad34851ef81aa34106cc5e
|
3 |
+
size 7574
|
predictions_train.csv.gz
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:a231424702e72c000ec5b88a63f57267b8841098c66b4b951711051bb7076d4d
|
3 |
+
size 7686
|
predictions_validation.csv.gz
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:a54bf2abbe83b4390bf1b89e15f6c33c316a18cfad25b37f3d23a383c00c6ac4
|
3 |
+
size 24216
|
repo-name.txt
ADDED
@@ -0,0 +1 @@
|
|
|
|
|
1 |
+
scbirlab/spark-dv-2503-spne
|
training-args.json
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"batch_size": 16,
|
3 |
+
"epochs": 2000,
|
4 |
+
"val_filename": "/nemo/lab/johnsone/home/users/johnsoe/data/datasets/thomas-2018-spark-wt/Streptococcus-pneumoniae/scaffold-split-validation.csv.gz"
|
5 |
+
}
|
training-data.hf/cache-6e45fb8cb7c74a86.arrow
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:3574722ef381aa5d70726ae87ce52b80ff89880f28716357175dfbee702afb94
|
3 |
+
size 199416
|
training-data.hf/data-00000-of-00001.arrow
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:0fde13bc580827883581c9f2d2800e155336859943053dc80c7135280c5bd409
|
3 |
+
size 198824
|
training-data.hf/dataset_info.json
ADDED
@@ -0,0 +1,126 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"builder_name": "csv",
|
3 |
+
"citation": "",
|
4 |
+
"config_name": "default",
|
5 |
+
"dataset_name": "csv",
|
6 |
+
"dataset_size": 2735,
|
7 |
+
"description": "",
|
8 |
+
"download_checksums": {
|
9 |
+
"/nemo/lab/johnsone/home/users/johnsoe/data/datasets/thomas-2018-spark-wt/Streptococcus-pneumoniae/scaffold-split-train.csv.gz": {
|
10 |
+
"num_bytes": 1012,
|
11 |
+
"checksum": null
|
12 |
+
}
|
13 |
+
},
|
14 |
+
"download_size": 1012,
|
15 |
+
"features": {
|
16 |
+
"smiles": {
|
17 |
+
"feature": {
|
18 |
+
"dtype": "string",
|
19 |
+
"_type": "Value"
|
20 |
+
},
|
21 |
+
"_type": "Sequence"
|
22 |
+
},
|
23 |
+
"inputs": {
|
24 |
+
"V_d": {
|
25 |
+
"dtype": "null",
|
26 |
+
"_type": "Value"
|
27 |
+
},
|
28 |
+
"gt_mask": {
|
29 |
+
"dtype": "null",
|
30 |
+
"_type": "Value"
|
31 |
+
},
|
32 |
+
"lt_mask": {
|
33 |
+
"dtype": "null",
|
34 |
+
"_type": "Value"
|
35 |
+
},
|
36 |
+
"mg": {
|
37 |
+
"E": {
|
38 |
+
"feature": {
|
39 |
+
"feature": {
|
40 |
+
"dtype": "float32",
|
41 |
+
"_type": "Value"
|
42 |
+
},
|
43 |
+
"_type": "Sequence"
|
44 |
+
},
|
45 |
+
"_type": "Sequence"
|
46 |
+
},
|
47 |
+
"V": {
|
48 |
+
"feature": {
|
49 |
+
"feature": {
|
50 |
+
"dtype": "float32",
|
51 |
+
"_type": "Value"
|
52 |
+
},
|
53 |
+
"_type": "Sequence"
|
54 |
+
},
|
55 |
+
"_type": "Sequence"
|
56 |
+
},
|
57 |
+
"edge_index": {
|
58 |
+
"feature": {
|
59 |
+
"feature": {
|
60 |
+
"dtype": "float32",
|
61 |
+
"_type": "Value"
|
62 |
+
},
|
63 |
+
"_type": "Sequence"
|
64 |
+
},
|
65 |
+
"_type": "Sequence"
|
66 |
+
},
|
67 |
+
"rev_edge_index": {
|
68 |
+
"feature": {
|
69 |
+
"dtype": "float32",
|
70 |
+
"_type": "Value"
|
71 |
+
},
|
72 |
+
"_type": "Sequence"
|
73 |
+
}
|
74 |
+
},
|
75 |
+
"weight": {
|
76 |
+
"dtype": "float32",
|
77 |
+
"_type": "Value"
|
78 |
+
},
|
79 |
+
"x_d": {
|
80 |
+
"feature": {
|
81 |
+
"dtype": "float32",
|
82 |
+
"_type": "Value"
|
83 |
+
},
|
84 |
+
"_type": "Sequence"
|
85 |
+
},
|
86 |
+
"y": {
|
87 |
+
"feature": {
|
88 |
+
"dtype": "float32",
|
89 |
+
"_type": "Value"
|
90 |
+
},
|
91 |
+
"_type": "Sequence"
|
92 |
+
}
|
93 |
+
},
|
94 |
+
"labels": {
|
95 |
+
"feature": {
|
96 |
+
"dtype": "float64",
|
97 |
+
"_type": "Value"
|
98 |
+
},
|
99 |
+
"_type": "Sequence"
|
100 |
+
},
|
101 |
+
"extra_features": {
|
102 |
+
"feature": {
|
103 |
+
"dtype": "float32",
|
104 |
+
"_type": "Value"
|
105 |
+
},
|
106 |
+
"_type": "Sequence"
|
107 |
+
}
|
108 |
+
},
|
109 |
+
"homepage": "",
|
110 |
+
"license": "",
|
111 |
+
"size_in_bytes": 3747,
|
112 |
+
"splits": {
|
113 |
+
"train": {
|
114 |
+
"name": "train",
|
115 |
+
"num_bytes": 2735,
|
116 |
+
"num_examples": 6,
|
117 |
+
"dataset_name": "csv"
|
118 |
+
}
|
119 |
+
},
|
120 |
+
"version": {
|
121 |
+
"version_str": "0.0.0",
|
122 |
+
"major": 0,
|
123 |
+
"minor": 0,
|
124 |
+
"patch": 0
|
125 |
+
}
|
126 |
+
}
|
training-data.hf/state.json
ADDED
@@ -0,0 +1,15 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"_data_files": [
|
3 |
+
{
|
4 |
+
"filename": "data-00000-of-00001.arrow"
|
5 |
+
}
|
6 |
+
],
|
7 |
+
"_fingerprint": "518d6c3dd41538bb",
|
8 |
+
"_format_columns": null,
|
9 |
+
"_format_kwargs": {
|
10 |
+
"dtype": "float"
|
11 |
+
},
|
12 |
+
"_format_type": "numpy",
|
13 |
+
"_output_all_columns": false,
|
14 |
+
"_split": "train"
|
15 |
+
}
|
training-log.csv
ADDED
@@ -0,0 +1,54 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
epoch,step,loss,val_loss
|
2 |
+
0,0,35.035179138183594,28.506013870239254
|
3 |
+
1,1,34.59953308105469,28.133272171020508
|
4 |
+
2,2,34.28255081176758,27.75455093383789
|
5 |
+
3,3,33.872459411621094,27.3608512878418
|
6 |
+
4,4,33.49507141113281,26.94652557373047
|
7 |
+
5,5,32.93204498291016,26.503360748291016
|
8 |
+
6,6,32.40605163574219,26.024642944335938
|
9 |
+
7,7,31.995607376098636,25.50464630126953
|
10 |
+
8,8,31.35414695739746,24.93709945678711
|
11 |
+
9,9,30.852384567260746,24.317028045654297
|
12 |
+
10,10,30.293041229248047,23.639673233032227
|
13 |
+
11,11,29.3199462890625,22.898895263671875
|
14 |
+
12,12,28.761568069458008,22.09198760986328
|
15 |
+
13,13,27.734697341918945,21.21438980102539
|
16 |
+
14,14,26.935657501220703,20.2636833190918
|
17 |
+
15,15,25.863750457763672,19.23866271972656
|
18 |
+
16,16,24.692529678344727,18.138511657714844
|
19 |
+
17,17,23.58732795715332,16.965312957763672
|
20 |
+
18,18,22.249080657958984,15.723073959350586
|
21 |
+
19,19,20.945390701293945,14.418327331542969
|
22 |
+
20,20,19.49625587463379,13.059042930603027
|
23 |
+
21,21,18.035572052001957,11.65770435333252
|
24 |
+
22,22,16.04208755493164,10.229310035705566
|
25 |
+
23,23,14.609708786010742,8.793594360351562
|
26 |
+
24,24,12.870007514953612,7.373780250549316
|
27 |
+
25,25,11.573694229125977,5.998498916625977
|
28 |
+
26,26,9.815820693969728,4.699665546417236
|
29 |
+
27,27,7.948325157165527,3.513622283935547
|
30 |
+
28,28,6.660716533660889,2.480156421661377
|
31 |
+
29,29,5.119527339935303,1.6397855281829834
|
32 |
+
30,30,3.504617214202881,1.0356911420822144
|
33 |
+
31,31,2.7646963596343994,0.7050375938415527
|
34 |
+
32,32,1.6973930597305298,0.6781078577041626
|
35 |
+
33,33,1.075992226600647,0.9667624831199646
|
36 |
+
34,34,0.6845546960830688,1.556015133857727
|
37 |
+
35,35,0.3879012167453766,2.3837873935699463
|
38 |
+
36,36,0.4765850305557251,3.3532373905181885
|
39 |
+
37,37,0.8820050954818726,4.322829723358154
|
40 |
+
38,38,1.508744478225708,5.122341632843018
|
41 |
+
39,39,2.0902042388916016,5.632863998413086
|
42 |
+
40,40,1.994970798492432,5.850592136383057
|
43 |
+
41,41,2.3712332248687744,5.758612632751465
|
44 |
+
42,42,2.082725524902344,5.433818817138672
|
45 |
+
43,43,1.8089426755905151,4.966310977935791
|
46 |
+
44,44,1.557389497756958,4.401313781738281
|
47 |
+
45,45,1.2402079105377195,3.8155436515808105
|
48 |
+
46,46,0.9037799835205078,3.2567148208618164
|
49 |
+
47,47,0.9471979141235352,2.732806444168091
|
50 |
+
48,48,0.7469801306724548,2.27205228805542
|
51 |
+
49,49,0.53239905834198,1.886436462402344
|
52 |
+
50,50,0.5718014240264893,1.578837513923645
|
53 |
+
51,51,0.4372279047966003,1.3402189016342163
|
54 |
+
52,52,0.531734824180603,1.1567370891571045
|
training-log.png
ADDED
![]() |