arer90
readme_fix2
bb2b709
---
language: en
tags:
- solubility-prediction
- machine-learning
- chemistry
- drug-discovery
license: mit
datasets:
- custom
metrics:
- rmse
- r2
library_name: tensorflow
---
# Automated Network Optimizer (ANO) for Enhanced Prediction of Intrinsic Solubility in Drug-like Organic Compounds: A Comprehensive Machine Learning Approach
## Overview
This repository presents a novel approach to predicting aqueous solubility of drug-like organic compounds using our Automated Network Optimizer (ANO) framework. By integrating advanced machine learning techniques with automated feature selection and hyperparameter optimization, we achieve state-of-the-art prediction accuracy for intrinsic solubility (logS).
<div align="center">
<a href="https://huggingface.co/arer90/ANO_solubility_prediction/resolve/main/result_prior/res1.png" target="_blank">
<img src="https://huggingface.co/arer90/ANO_solubility_prediction/resolve/main/result_prior/res1.png" alt="Result 1" width="400"/>
</a>
</div>
<div align="center">
<a href="https://huggingface.co/arer90/ANO_solubility_prediction/resolve/main/result_prior/res2.png" target="_blank">
<img src="https://huggingface.co/arer90/ANO_solubility_prediction/resolve/main/result_prior/res2.png" alt="Result 2" width="400"/>
</a>
</div>
<div align="center">
<a href="https://huggingface.co/arer90/ANO_solubility_prediction/resolve/main/result_prior/res3.png" target="_blank">
<img src="https://huggingface.co/arer90/ANO_solubility_prediction/resolve/main/result_prior/res3.png" alt="Result 3" width="400"/>
</a>
</div>
## System Requirements
### Dependencies
- Python 3.12 or later
- TensorFlow 2.15.0 (Linux/MacOS/WSL)
- TensorFlow 2.15.0-GPU (Windows)
- RDKit 2024.3.1
- pandas 2.2.1
- scikit-learn 1.4.1.post1
- seaborn 0.13.2
- matplotlib 3.8.3
- optuna 3.5.0
## Repository Structure
### Jupyter Notebooks
1. **1_standard_ML.ipynb**
- Comprehensive evaluation of traditional ML approaches
- Random Forest, XGBoost, and SVM implementations
- Baseline performance metrics and comparative analysis
2. **2_solubility_fingerprint_comparison.ipynb**
- Detailed analysis of molecular fingerprint methods
- Evaluation of ECFP, MACCS, and custom fingerprints
- Performance comparison across fingerprint types
3. **3_ANO_with_feature_checker.ipynb**
- Implementation of ANO framework
- Automated feature importance analysis
- Real-time feature selection optimization
4. **4_ANO_feature.ipynb**
- Optimal physicochemical feature search using ANO
5. **5_ANO_structure.ipynb**
- Hyperparameter optimization using ANO
6. **6_ANO_network_[fea_struc].ipynb**
- Network architecture optimization based on optimal physicochemical features
7. **7_ANO_network_[struc_fea].ipynb**
- Network architecture optimization based on optimal hyperparameters
8. **7_Solubility_final_HPO_proving.ipynb** (Bug fixing...)
- Performance validation of final ANO model
9. **8_solubility_xai.ipynb**
- Model explainability analysis
- Permutation importance and SHAP evaluation
- Correlation analysis between physicochemical features and logS
- Implementation of Lipinski's Rule of 5
### Core Python Modules
- **basic_model.py**
- Foundation architecture for fingerprint analysis
- Modular design for easy extension
- Comprehensive validation methods
- **feature_search.py**
- Feature search implementation for ANO
(used in 4_ANO_feature.ipynb)
- **feature_selection.py**
- Feature selection implementation for ANO
(used in 5_ANO_structure.ipynb, 6_ANO_network_[fea_struc].ipynb, 7_ANO_network_[struc_fea].ipynb)
- **learning_model.py**
- ANO learning model implementation
- Used in deep learning and feature optimization notebooks
(used in 3_ANO_with_feature_checker, 3_solubility_descriptor_deeplearning, 4_ANO_feature, 5_ANO_structure.ipynb, 6_ANO_network_[fea_struc].ipynb, 7_ANO_network_[struc_fea].ipynb)
## Key Innovations
- 49 carefully selected chemical descriptors for target dataset
- Fast and efficient selections of chemical descriptors and hyperparameters in machine learning models
<div align="center">
<a href="https://huggingface.co/arer90/ANO_solubility_prediction/resolve/main/result_prior/descriptors_list.png" target="_blank">
<img src="https://huggingface.co/arer90/ANO_solubility_prediction/resolve/main/result_prior/descriptors_list.png" alt="Chemical Descriptors List" width="400"/>
</a>
</div>
## Model Availability
Pre-trained models and complete results are available at:
https://huggingface.co/arer90/ANO_solubility_prediction/tree/main
## Version
Current Version: 1.0.2 (2024.11)
## License
This project is licensed under the MIT License - see the LICENSE file for details.
## Citation
If you use this work in your research, please cite:
```bibtex
@article{ANO2024solubility,
title={Prediction of intrinsic solubility for drug-like organic compounds using Automated Network Optimizer (ANO) for physicochemical feature and hyperparameter optimization},
author={Chung, Young Kyu and Lee, Seung Jun and Lee, Jonggeun and Cho, Hyunwoo and Kim, Sung-Jin and Huh, June},
journal={ChemRxiv},
year={2024},
doi={10.26434/chemrxiv-2024-mp291}
}
```