TY - JOUR
T1 - Two-stage knowledge distillation for visible-infrared person re-identification
AU - Shi, Jiangming
AU - Yin, Xiangbo
AU - Zhang, Demao
AU - Zhang, Zhizhong
AU - Xie, Yuan
AU - Qu, Yanyun
N1 - Publisher Copyright:
© 2025
PY - 2026/1
Y1 - 2026/1
N2 - Visible-infrared person re-identification (VI-ReID) is an important retrieval task that has recently sparked interest due to the requirements for continuous 24-hour surveillance. VI-ReID aims to retrieve specific visible or infrared person images in one modality based on a query from the other modality. Visible and infrared images have different spectra, leading to huge modality gap that is major challenge for VI-ReID. Recent methods reduce the gap, but they ignore intra-modality discrepancy. Besides, these methods require well-annotated cross-modality data, but gathering such data is time-consuming and labor-intensive. In this paper, we propose a novel Two-Stage Knowledge Distillation method (TSKD) for VI-ReID, which adopts a simple-to-difficult strategy for cross-modality feature alignment and explores a way to reduce annotation costs by using only a small number of labeled data. TSKD consists of three novel components: soft-identity learning (SI), self-mimic learning (SM), and mutual-distillation learning (MD). SI first generates pseudo-labels with confidence for unlabeled data, thereby decreasing the annotation cost. After that, SM learns the prototype for each person in special modality and minimizes the intra-modality discrepancy. Finally, MD performs mutual distillation for cross-modality feature alignment in the set-level measurement rather than the instance measurement for each person. Importantly, we demonstrate that TSKD achieves stronger robustness under weak supervision. Our experimental results on two VI-ReID benchmarks demonstrate the effectiveness of TSKD under both full-supervision and weak-supervision settings. The code is released at https://github.com/shijiangming1/TSKD.
AB - Visible-infrared person re-identification (VI-ReID) is an important retrieval task that has recently sparked interest due to the requirements for continuous 24-hour surveillance. VI-ReID aims to retrieve specific visible or infrared person images in one modality based on a query from the other modality. Visible and infrared images have different spectra, leading to huge modality gap that is major challenge for VI-ReID. Recent methods reduce the gap, but they ignore intra-modality discrepancy. Besides, these methods require well-annotated cross-modality data, but gathering such data is time-consuming and labor-intensive. In this paper, we propose a novel Two-Stage Knowledge Distillation method (TSKD) for VI-ReID, which adopts a simple-to-difficult strategy for cross-modality feature alignment and explores a way to reduce annotation costs by using only a small number of labeled data. TSKD consists of three novel components: soft-identity learning (SI), self-mimic learning (SM), and mutual-distillation learning (MD). SI first generates pseudo-labels with confidence for unlabeled data, thereby decreasing the annotation cost. After that, SM learns the prototype for each person in special modality and minimizes the intra-modality discrepancy. Finally, MD performs mutual distillation for cross-modality feature alignment in the set-level measurement rather than the instance measurement for each person. Importantly, we demonstrate that TSKD achieves stronger robustness under weak supervision. Our experimental results on two VI-ReID benchmarks demonstrate the effectiveness of TSKD under both full-supervision and weak-supervision settings. The code is released at https://github.com/shijiangming1/TSKD.
KW - Knowledge distillation
KW - Re-identification
KW - Visible-infrared person
UR - https://www.scopus.com/pages/publications/105007142188
U2 - 10.1016/j.patcog.2025.111850
DO - 10.1016/j.patcog.2025.111850
M3 - 文章
AN - SCOPUS:105007142188
SN - 0031-3203
VL - 169
JO - Pattern Recognition
JF - Pattern Recognition
M1 - 111850
ER -