Abstract
Visible–infrared person re-identification (VI-ReID) aims to learn the identity-aware features between visible and infrared person images. However, most works rely on two publicly available datasets, i.e., SYSU-MM01 and RegDB, which is limited by the limited amount of training data and the lack of rich scenes and perspectives. In this paper, we propose a controllable diffusion framework for infrared person image generation and re-identification. Our approach is beyond the existing diffusion model in two perspectives: (1) we use LoRA to fine-tune the existing diffusion models with VI-ReID dataset and therefore it helps the diffusion model understand the infrared modality. A text adapter is then utilized to transfer the semantic understanding ability of Large Language Model (LLMs) to our generation models; (2) we design a controllable generation module to make the generated person images, from the same textual description, identity-aware. After meticulous post-processing operations, our approach is capable of producing diverse visible and infrared person images, allowing for improving the discrimination of existing VI-ReID model without any annotations. We expand the VI-ReID dataset with our generated images, and conduct extensive experiments on VI-ReID models. Experimental results demonstrate the effectiveness of our method.
| Original language | English |
|---|---|
| Article number | 111561 |
| Journal | Pattern Recognition |
| Volume | 165 |
| DOIs | |
| State | Published - Sep 2025 |
Keywords
- Cross-modality person re-identification
- Image generation