TY - JOUR
T1 - TGF
T2 - Multiscale transformer graph attention network for multi-sensor image fusion
AU - Mustafa, Hafiz Tayyab
AU - Shamsolmoali, Pourya
AU - Lee, Ik Hyun
N1 - Publisher Copyright:
© 2023 Elsevier Ltd
PY - 2024/3/15
Y1 - 2024/3/15
N2 - Multisensor image fusion is a challenging task that aims to produce a composite image by fusing visible (VI) and infrared (IR) images. Deep neural networks have shown impressive performance for VI and IR image fusion however, majority of them overlook the internal patch-recurrence property of source images, which limits their ability to learn diverse features. To address this issue, we propose a novel fusion framework based on vision transformer and graph attention, utilizing local patch repetition to enhance the feature representation and texture recovery. In particular, the proposed transformer blocks can learn high-frequency domain-specific information of the source images. The graph attention mechanism provides additional guidance for the features utilizing the similarity and symmetry information across patches. Furthermore, the proposed graph attention fusion block (GAFB) improves the selectivity and effectiveness of feature learning. The GAFB can identify significant corresponding local and global details of source images. The complementary information containing long-range and local symmetric details across domains is combined while preserving the appropriate apparent intensity to generate the fused image. Through extensive evaluations on benchmark datasets, our proposed technique demonstrates superior performance. Significantly, our approach achieves SSIM scores of 0.7552 on the TNO dataset and 0.7673 on the roadscene dataset, surpassing the state-of-the-art techniques used for evaluation.
AB - Multisensor image fusion is a challenging task that aims to produce a composite image by fusing visible (VI) and infrared (IR) images. Deep neural networks have shown impressive performance for VI and IR image fusion however, majority of them overlook the internal patch-recurrence property of source images, which limits their ability to learn diverse features. To address this issue, we propose a novel fusion framework based on vision transformer and graph attention, utilizing local patch repetition to enhance the feature representation and texture recovery. In particular, the proposed transformer blocks can learn high-frequency domain-specific information of the source images. The graph attention mechanism provides additional guidance for the features utilizing the similarity and symmetry information across patches. Furthermore, the proposed graph attention fusion block (GAFB) improves the selectivity and effectiveness of feature learning. The GAFB can identify significant corresponding local and global details of source images. The complementary information containing long-range and local symmetric details across domains is combined while preserving the appropriate apparent intensity to generate the fused image. Through extensive evaluations on benchmark datasets, our proposed technique demonstrates superior performance. Significantly, our approach achieves SSIM scores of 0.7552 on the TNO dataset and 0.7673 on the roadscene dataset, surpassing the state-of-the-art techniques used for evaluation.
KW - Deep learning
KW - Graph attention
KW - Image fusion
KW - Transformer
UR - https://www.scopus.com/pages/publications/85173627691
U2 - 10.1016/j.eswa.2023.121789
DO - 10.1016/j.eswa.2023.121789
M3 - 文章
AN - SCOPUS:85173627691
SN - 0957-4174
VL - 238
JO - Expert Systems with Applications
JF - Expert Systems with Applications
M1 - 121789
ER -