TY - JOUR
T1 - FrDiff
T2 - Framelet-Based Conditional Diffusion Model for Multispectral and Panchromatic Image Fusion
AU - Zhang, Junkang
AU - Fang, Faming
AU - Wang, Tingting
AU - Zhang, Guixu
AU - Song, Haichuan
N1 - Publisher Copyright:
© 1999-2012 IEEE.
PY - 2025
Y1 - 2025
N2 - The process of fusing low-resolution multispectral (LRMS) and high-resolution panchromatic (PAN) imagery, commonly referred to as pansharpening, is intended to generate high-resolution multispectral (HRMS) imagery. Typically, most pre-existing pansharpening frameworks mainly emphasize the straightforward learning of the mapping relationship among PAN and LRMS images to HRMS images. However, a key limitation of these frameworks is their potential overemphasis on spatial information, particularly the enhancement of low-frequency components. As a result, such an oversight potentially hinders the model's ability to simultaneously restore both spectral and spatial details. To address this issue, we propose a novel pansharpening model based on the denoising diffusion probabilistic model (DDPM), dubbed FrDiff. Specifically, we build a framelet-based conditional diffusion model that leverages the generative power of diffusion models to produce more refine results. Different from conventional methods directly inferring HRMS images, our strategy is designed to project their framelet coefficients, utilizing the available PAN and LRMS images as resources. This approach enables the separation of high-frequency and low-frequency components through framelet transformation, which are subsequently recombined to create a novel set of conditional embeddings that feed into the diffusion process. At the same time, the powerful predictive power of the diffusion model is exploited to simultaneously recover the high-frequency and low-frequency components of the HRMS. Moreover, we introduce a framelet-oriented cross-attention module dedicated to honing spectral fidelity. This module is crucial for improving the spectral precision of the HRMS images, ensuring a balanced emphasis on both spatial and spectral enhancements. Quantitative and qualitative experiments on multiple benchmark datasets demonstrate that the proposed method achieves more robustness and high-quality results than other state-of-the-art pansharpening methods.
AB - The process of fusing low-resolution multispectral (LRMS) and high-resolution panchromatic (PAN) imagery, commonly referred to as pansharpening, is intended to generate high-resolution multispectral (HRMS) imagery. Typically, most pre-existing pansharpening frameworks mainly emphasize the straightforward learning of the mapping relationship among PAN and LRMS images to HRMS images. However, a key limitation of these frameworks is their potential overemphasis on spatial information, particularly the enhancement of low-frequency components. As a result, such an oversight potentially hinders the model's ability to simultaneously restore both spectral and spatial details. To address this issue, we propose a novel pansharpening model based on the denoising diffusion probabilistic model (DDPM), dubbed FrDiff. Specifically, we build a framelet-based conditional diffusion model that leverages the generative power of diffusion models to produce more refine results. Different from conventional methods directly inferring HRMS images, our strategy is designed to project their framelet coefficients, utilizing the available PAN and LRMS images as resources. This approach enables the separation of high-frequency and low-frequency components through framelet transformation, which are subsequently recombined to create a novel set of conditional embeddings that feed into the diffusion process. At the same time, the powerful predictive power of the diffusion model is exploited to simultaneously recover the high-frequency and low-frequency components of the HRMS. Moreover, we introduce a framelet-oriented cross-attention module dedicated to honing spectral fidelity. This module is crucial for improving the spectral precision of the HRMS images, ensuring a balanced emphasis on both spatial and spectral enhancements. Quantitative and qualitative experiments on multiple benchmark datasets demonstrate that the proposed method achieves more robustness and high-quality results than other state-of-the-art pansharpening methods.
KW - Convolutional neural network
KW - diffusion model
KW - framelet transform
KW - pansharpening
UR - https://www.scopus.com/pages/publications/105004040373
U2 - 10.1109/TMM.2025.3565985
DO - 10.1109/TMM.2025.3565985
M3 - 文章
AN - SCOPUS:105004040373
SN - 1520-9210
VL - 27
SP - 5989
EP - 6002
JO - IEEE Transactions on Multimedia
JF - IEEE Transactions on Multimedia
ER -