Skip to main navigation Skip to search Skip to main content

Ens-Chemage: Robust Molecular Image-Based Ensemble Transfer Learning Framework for Small Contaminant Property Data Sets

  • Shifa Zhong*
  • , Jian Guan
  • , Zhenhua Dai
  • , Jibai Li
  • , Xuanying Cai
  • , Xintong Qu
  • , Xiaohong Guan*
  • *Corresponding author for this work
  • East China Normal University

Research output: Contribution to journalArticlepeer-review

Abstract

Contaminant property data sets are typically small, posing challenges for developing accurate deep learning (DL) models. In this study, we pretrained ResNet18 models on the PubChem data set (∼10 million molecules) using molecular RGB images as inputs and their MACCS fingerprints as labels, generating five models (Chemage1 to Chemage5) with various pretraining accuracy, and fine-tuned them on 10 MoleculeNet and 12 contaminant property data sets. We found that appropriate model architectures and fine-tuning techniques significantly improve the transfer learning efficacy. We then developed an ensemble model, Ens-Chemage, to combine the strengths of these individual models. Ens-Chemage outperformed conventional machine learning (ML) models and ImageMol on almost all tested data sets. Through model interpretation, we found that Ens-Chemage learned more accurate and distinct features than the other models. Additionally, we defined its applicability domain (AD) by using an uncertainty-based approach. Finally, Ens-Chemage has been deployed for free public use at https://ens-chemage.streamlit.app/. This study presents significant advancements in the application of DL for small contaminant property data sets.

Original languageEnglish
Pages (from-to)1200-1206
Number of pages7
JournalEnvironmental Science and Technology Letters
Volume11
Issue number11
DOIs
StatePublished - 12 Nov 2024

Keywords

  • Deep learning
  • Ensemble learning
  • Molecular image
  • Molecular property prediction
  • Transfer learning

Fingerprint

Dive into the research topics of 'Ens-Chemage: Robust Molecular Image-Based Ensemble Transfer Learning Framework for Small Contaminant Property Data Sets'. Together they form a unique fingerprint.

Cite this