摘要
Contaminant property data sets are typically small, posing challenges for developing accurate deep learning (DL) models. In this study, we pretrained ResNet18 models on the PubChem data set (∼10 million molecules) using molecular RGB images as inputs and their MACCS fingerprints as labels, generating five models (Chemage1 to Chemage5) with various pretraining accuracy, and fine-tuned them on 10 MoleculeNet and 12 contaminant property data sets. We found that appropriate model architectures and fine-tuning techniques significantly improve the transfer learning efficacy. We then developed an ensemble model, Ens-Chemage, to combine the strengths of these individual models. Ens-Chemage outperformed conventional machine learning (ML) models and ImageMol on almost all tested data sets. Through model interpretation, we found that Ens-Chemage learned more accurate and distinct features than the other models. Additionally, we defined its applicability domain (AD) by using an uncertainty-based approach. Finally, Ens-Chemage has been deployed for free public use at https://ens-chemage.streamlit.app/. This study presents significant advancements in the application of DL for small contaminant property data sets.
| 源语言 | 英语 |
|---|---|
| 页(从-至) | 1200-1206 |
| 页数 | 7 |
| 期刊 | Environmental Science and Technology Letters |
| 卷 | 11 |
| 期 | 11 |
| DOI | |
| 出版状态 | 已出版 - 12 11月 2024 |
指纹
探究 'Ens-Chemage: Robust Molecular Image-Based Ensemble Transfer Learning Framework for Small Contaminant Property Data Sets' 的科研主题。它们共同构成独一无二的指纹。引用此
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver