TY - JOUR
T1 - Deep Learning for Bidirectional Translation between Molecular Structures and Vibrational Spectra
AU - Hu, Tianqing
AU - Zou, Zihan
AU - Li, Bo
AU - Zhu, Tong
AU - Gu, Shaonan
AU - Jiang, Jun
AU - Luo, Yi
AU - Hu, Wei
N1 - Publisher Copyright:
© 2025 American Chemical Society
PY - 2025/8/6
Y1 - 2025/8/6
N2 - Two deep learning models, TranSpec and SpecGNN, were developed to establish a bidirectional mapping between molecular vibrational spectra and simplified molecular input line entry system (SMILES) representations, akin to a “translation” between the language of spectra and the language of molecular structures. Initially, TranSpec achieved accuracy rates of 55 and 63% for quantum chemistry (QC)-calculated IR and Raman spectral data sets, respectively, but its performance dropped to 11% for the NIST experimental IR data set. To address this, we combined IR and Raman spectra as input; augmented the data set; employed model fusion, transfer learning, and multisource learning; applied molecular mass filtering; and leveraged SpecGNN for spectral simulation and candidate reordering. These improvements boosted TranSpec’s accuracy to 53.6% for the experimental IR data set. Notably, SpecGNN outperformed traditional QC methods in terms of both spectral accuracy and computational efficiency. Finally, we demonstrated TranSpec’s ability to recognize functional groups and distinguish isomers or homologues. Together, TranSpec and SpecGNN models provide an efficient and accurate AI-driven framework for interpreting molecular structures and spectra, advancing applications in spectroscopy and cheminformatics.
AB - Two deep learning models, TranSpec and SpecGNN, were developed to establish a bidirectional mapping between molecular vibrational spectra and simplified molecular input line entry system (SMILES) representations, akin to a “translation” between the language of spectra and the language of molecular structures. Initially, TranSpec achieved accuracy rates of 55 and 63% for quantum chemistry (QC)-calculated IR and Raman spectral data sets, respectively, but its performance dropped to 11% for the NIST experimental IR data set. To address this, we combined IR and Raman spectra as input; augmented the data set; employed model fusion, transfer learning, and multisource learning; applied molecular mass filtering; and leveraged SpecGNN for spectral simulation and candidate reordering. These improvements boosted TranSpec’s accuracy to 53.6% for the experimental IR data set. Notably, SpecGNN outperformed traditional QC methods in terms of both spectral accuracy and computational efficiency. Finally, we demonstrated TranSpec’s ability to recognize functional groups and distinguish isomers or homologues. Together, TranSpec and SpecGNN models provide an efficient and accurate AI-driven framework for interpreting molecular structures and spectra, advancing applications in spectroscopy and cheminformatics.
UR - https://www.scopus.com/pages/publications/105013041517
U2 - 10.1021/jacs.5c05010
DO - 10.1021/jacs.5c05010
M3 - 文章
C2 - 40700648
AN - SCOPUS:105013041517
SN - 0002-7863
VL - 147
SP - 27525
EP - 27536
JO - Journal of the American Chemical Society
JF - Journal of the American Chemical Society
IS - 31
ER -