TY - GEN
T1 - Alternate Geometric and Semantic Denoising Diffusion for Protein Inverse Folding
AU - Wang, Chenglin
AU - Zhou, Yucheng
AU - Wang, Zhe
AU - Zhai, Zijie
AU - Shen, Jianbing
AU - Zhang, Kai
N1 - Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2026.
PY - 2026
Y1 - 2026
N2 - Protein inverse folding is a fundamental problem in bioinformatics, aiming to recover the amino acid sequences from a given protein backbone structure. Despite the success of existing methods, they still have two limitations: (1) widely used topological modeling via GNNs may not effectively integrate geometric context of the entire protein 3D structure by focusing on only local residue message passing, and (2) current denoising processes primarily rely on geometric relations to update residue representations, while neglecting the semantic and functional correlations between different amino acid types. In this work, we propose an Alternate Geometric and Semantic Denoising Diffusion (AGSDD) that performs two types of denoising, i.e., geometric denoising and semantic denoising in turn, in the joint Geo-semantic residue representation space: (1) the geometric denoising module uses a geometric contextual aggregator to encode global contextual information from the entire protein structure and selectively distributes information to each residue; and (2) the semantic denoising module uses a learnable key-value dictionary of residue-types to facilitate communication between them so that learned residue features can be more accurately aligned to proper residue types. In experiments, we conduct extensive evaluations on the CATH4.2, TS50 and TS500 datasets, and observe that even without using any pre-trained protein language models, AGSDD still outperforms leading methods, achieving state-of-the-art performance and exhibiting strong generalization capabilities.
AB - Protein inverse folding is a fundamental problem in bioinformatics, aiming to recover the amino acid sequences from a given protein backbone structure. Despite the success of existing methods, they still have two limitations: (1) widely used topological modeling via GNNs may not effectively integrate geometric context of the entire protein 3D structure by focusing on only local residue message passing, and (2) current denoising processes primarily rely on geometric relations to update residue representations, while neglecting the semantic and functional correlations between different amino acid types. In this work, we propose an Alternate Geometric and Semantic Denoising Diffusion (AGSDD) that performs two types of denoising, i.e., geometric denoising and semantic denoising in turn, in the joint Geo-semantic residue representation space: (1) the geometric denoising module uses a geometric contextual aggregator to encode global contextual information from the entire protein structure and selectively distributes information to each residue; and (2) the semantic denoising module uses a learnable key-value dictionary of residue-types to facilitate communication between them so that learned residue features can be more accurately aligned to proper residue types. In experiments, we conduct extensive evaluations on the CATH4.2, TS50 and TS500 datasets, and observe that even without using any pre-trained protein language models, AGSDD still outperforms leading methods, achieving state-of-the-art performance and exhibiting strong generalization capabilities.
KW - Alternate Denoising
KW - Diffusion Model
KW - Protein Inverse Folding
UR - https://www.scopus.com/pages/publications/105020023674
U2 - 10.1007/978-3-032-06066-2_21
DO - 10.1007/978-3-032-06066-2_21
M3 - 会议稿件
AN - SCOPUS:105020023674
SN - 9783032060655
T3 - Lecture Notes in Computer Science
SP - 350
EP - 366
BT - Machine Learning and Knowledge Discovery in Databases. Research Track - European Conference, ECML PKDD 2025, Proceedings
A2 - Ribeiro, Rita P.
A2 - Jorge, Alípio M.
A2 - Pfahringer, Bernhard
A2 - Japkowicz, Nathalie
A2 - Larrañaga, Pedro
A2 - Soares, Carlos
A2 - Abreu, Pedro H.
A2 - Gama, João
PB - Springer Science and Business Media Deutschland GmbH
T2 - European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, ECML PKDD 2025
Y2 - 15 September 2025 through 19 September 2025
ER -