Molecular image-convolutional neural network (CNN) assisted QSAR models for predicting contaminant reactivity toward OH radicals: Transfer learning, data augmentation and model interpretation

  • Shifa Zhong
  • , Jiajie Hu
  • , Xiong Yu
  • , Huichun Zhang*
  • *Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

100 Scopus citations

Abstract

In this study, we used molecular images as a representation for organic compounds and combined them with a convolutional neural network (CNN) to develop quantitative structure-activity relationships (QSARs) for predicting compound rate constants toward OH radicals. We applied transfer learning and data augmentation to train molecular image-CNN models and the Gradient-weighted Class Activation Mapping (Grad-CAM) method to interpret them. Results showed that data augmentation and transfer learning can effectively enhance the robustness and predictive performance of the models, with the root-mean-square-error (RMSE) values on the test dataset (RMSEtest) decreasing from (0.395–0.45) to (0.284–0.339) after applying data augmentation, and the RMSE on the training dataset (RMSEtrain) decreasing from (0.452–0.592) to (0.123–0.151) after applying transfer learning. The obtained molecular image-CNN models showed comparative predictive performance (RMSEtest 0.284–0.339) with the molecular fingerprint-based models (RMSEtest 0.30–0.35). Grad-CAM interpretation showed that the molecular image-CNN models correctly chose the molecular features in the images and identified key functional groups that influenced the reactivity. The applicability domain analysis showed that the molecular image-CNN models have a broader applicability domain than molecular fingerprints-based models and the reactivity of any new compounds with a maximum similarity of over 0.85 to the compounds in the training dataset can be reliably predicted. This study demonstrated that molecular image-CNN is a new tool to develop QSARs for environmental applications and can be used to build trustful models that make meaningful predictions.

Original languageEnglish
Article number127998
JournalChemical Engineering Journal
Volume408
DOIs
StatePublished - 15 Mar 2021
Externally publishedYes

Keywords

  • Convolutional neural network (CNN)
  • Hydroxyl radical
  • Machine learning
  • Model interpretation
  • Molecular images
  • QSARs

Fingerprint

Dive into the research topics of 'Molecular image-convolutional neural network (CNN) assisted QSAR models for predicting contaminant reactivity toward OH radicals: Transfer learning, data augmentation and model interpretation'. Together they form a unique fingerprint.

Cite this