TY - JOUR
T1 - PromptIF
T2 - A prompt-based general image fusion framework
AU - Liu, Yijie
AU - Lei, Pengcheng
AU - Wang, Tingting
AU - Fang, Faming
AU - Zhang, Guixu
N1 - Publisher Copyright:
© 2026 Elsevier B.V. All rights are reserved, including those for text and data mining, AI training, and similar technologies.
PY - 2026/7
Y1 - 2026/7
N2 - Multimodal image fusion is a challenging task, involving areas such as visible-infrared fusion, multi-exposure fusion, and multi-focus fusion. These tasks require merging images from different modalities, each with unique characteristics, making it difficult to develop a unified model that can handle all of them effectively. While deep learning has made significant progress in these areas, the inherent differences between image types still present challenges in achieving optimal fusion performance. A unified model could simplify processing and improve results in downstream tasks, such as object detection, semantic segmentation, and scene analysis. Inspired by the success of prompt-based techniques in large models and natural language processing (NLP), we introduce PromptIF, a lightweight and efficient fusion model based on prompts. PromptIF is designed to adapt to different fusion tasks by using minimal extra parameters, which allows it to effectively preserve important image details while also differentiating between tasks. Our results demonstrate that PromptIF not only outperforms both traditional and recent fusion methods but also achieves strong results across various benchmarks and downstream applications. This shows that our approach is both flexible and effective in real-world scenarios. We will release the code to encourage further exploration and development in the field of multimodal image fusion.
AB - Multimodal image fusion is a challenging task, involving areas such as visible-infrared fusion, multi-exposure fusion, and multi-focus fusion. These tasks require merging images from different modalities, each with unique characteristics, making it difficult to develop a unified model that can handle all of them effectively. While deep learning has made significant progress in these areas, the inherent differences between image types still present challenges in achieving optimal fusion performance. A unified model could simplify processing and improve results in downstream tasks, such as object detection, semantic segmentation, and scene analysis. Inspired by the success of prompt-based techniques in large models and natural language processing (NLP), we introduce PromptIF, a lightweight and efficient fusion model based on prompts. PromptIF is designed to adapt to different fusion tasks by using minimal extra parameters, which allows it to effectively preserve important image details while also differentiating between tasks. Our results demonstrate that PromptIF not only outperforms both traditional and recent fusion methods but also achieves strong results across various benchmarks and downstream applications. This shows that our approach is both flexible and effective in real-world scenarios. We will release the code to encourage further exploration and development in the field of multimodal image fusion.
KW - Computer vision
UR - https://www.scopus.com/pages/publications/105033146713
U2 - 10.1016/j.displa.2026.103386
DO - 10.1016/j.displa.2026.103386
M3 - 文章
AN - SCOPUS:105033146713
SN - 0141-9382
VL - 93
JO - Displays
JF - Displays
M1 - 103386
ER -