TY - GEN
T1 - Multi modal architecture based on dual attention improvement to assist in glaucoma grading challenges
AU - Li, Zehao
AU - Dong, Qiwen
AU - Wang, Lin
AU - Wang, Ye
N1 - Publisher Copyright:
© 2025 SPIE.
PY - 2025
Y1 - 2025
N2 - Diabetic glaucoma, a prevalent complication of diabetes, seriously threatens patients' visual health and ranks as a leading cause of irreversible blindness among diabetic individuals. In clinical practice, accurate grading of diabetic glaucoma is crucial for formulating personalized treatment strategies and predicting disease progression. However, existing methods often rely on single - modality data or lack effective integration of multi-modal information, leading to suboptimal grading performance. This study aims to address these issues by presenting two multi-modal disease grading networks enhanced by the cross-attention mechanism for grading diabetic glaucoma using OCT and color fundus images. Two novel cross-attention based image fusion strategies are developed: one employs the multi - head cross - attention mechanism for better inter-modal information fusion, and the other combines self-attention and cross-attention mechanisms. The experimental results show that using the proposed cross attention based method, the Kappa value of multimodal scoring in the GAMMA challenge reaches 84.4%, and the F1 score is 81%. Our work exceeded the performance of the champion and runner up network models in that year's competition. Also, multi-modal grading accuracy shows a 1% - 4% increase compared to single-modality grading. This research not only improves the accuracy of diabetic glaucoma grading but also provides valuable feature support for future multi-task learning models.
AB - Diabetic glaucoma, a prevalent complication of diabetes, seriously threatens patients' visual health and ranks as a leading cause of irreversible blindness among diabetic individuals. In clinical practice, accurate grading of diabetic glaucoma is crucial for formulating personalized treatment strategies and predicting disease progression. However, existing methods often rely on single - modality data or lack effective integration of multi-modal information, leading to suboptimal grading performance. This study aims to address these issues by presenting two multi-modal disease grading networks enhanced by the cross-attention mechanism for grading diabetic glaucoma using OCT and color fundus images. Two novel cross-attention based image fusion strategies are developed: one employs the multi - head cross - attention mechanism for better inter-modal information fusion, and the other combines self-attention and cross-attention mechanisms. The experimental results show that using the proposed cross attention based method, the Kappa value of multimodal scoring in the GAMMA challenge reaches 84.4%, and the F1 score is 81%. Our work exceeded the performance of the champion and runner up network models in that year's competition. Also, multi-modal grading accuracy shows a 1% - 4% increase compared to single-modality grading. This research not only improves the accuracy of diabetic glaucoma grading but also provides valuable feature support for future multi-task learning models.
KW - Cross-Attention mechanism
KW - Diabetic glaucoma
KW - Multi-modal disease grading
KW - Self-attention mechanism
UR - https://www.scopus.com/pages/publications/105014298833
U2 - 10.1117/12.3069152
DO - 10.1117/12.3069152
M3 - 会议稿件
AN - SCOPUS:105014298833
T3 - Proceedings of SPIE - The International Society for Optical Engineering
BT - Fourth International Conference on Electronics Technology and Artificial Intelligence, ETAI 2025
A2 - Luo, Shaohua
A2 - Saxena, Akash
PB - SPIE
T2 - 4th International Conference on Electronics Technology and Artificial Intelligence, ETAI 2025
Y2 - 21 February 2025 through 23 February 2025
ER -