Ens-Chemage: Robust Molecular Image-Based Ensemble Transfer Learning Framework for Small Contaminant Property Data Sets

  • Shifa Zhong*
  • , Jian Guan
  • , Zhenhua Dai
  • , Jibai Li
  • , Xuanying Cai
  • , Xintong Qu
  • , Xiaohong Guan*
  • *Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

5 Scopus citations

Abstract

Contaminant property data sets are typically small, posing challenges for developing accurate deep learning (DL) models. In this study, we pretrained ResNet18 models on the PubChem data set (∼10 million molecules) using molecular RGB images as inputs and their MACCS fingerprints as labels, generating five models (Chemage1 to Chemage5) with various pretraining accuracy, and fine-tuned them on 10 MoleculeNet and 12 contaminant property data sets. We found that appropriate model architectures and fine-tuning techniques significantly improve the transfer learning efficacy. We then developed an ensemble model, Ens-Chemage, to combine the strengths of these individual models. Ens-Chemage outperformed conventional machine learning (ML) models and ImageMol on almost all tested data sets. Through model interpretation, we found that Ens-Chemage learned more accurate and distinct features than the other models. Additionally, we defined its applicability domain (AD) by using an uncertainty-based approach. Finally, Ens-Chemage has been deployed for free public use at https://ens-chemage.streamlit.app/. This study presents significant advancements in the application of DL for small contaminant property data sets.

Original languageEnglish
Pages (from-to)1200-1206
Number of pages7
JournalEnvironmental Science and Technology Letters
Volume11
Issue number11
DOIs
StatePublished - 12 Nov 2024

Keywords

  • Deep learning
  • Ensemble learning
  • Molecular image
  • Molecular property prediction
  • Transfer learning

Fingerprint

Dive into the research topics of 'Ens-Chemage: Robust Molecular Image-Based Ensemble Transfer Learning Framework for Small Contaminant Property Data Sets'. Together they form a unique fingerprint.

Cite this