TY - JOUR
T1 - Residual memory inference network for regression tracking with weighted gradient harmonized loss
AU - Zhang, Huanlong
AU - Zhang, Jiapeng
AU - Nie, Guohao
AU - Hu, Jilin
AU - Zhang, W. J.(Chris)
N1 - Publisher Copyright:
© 2022 Elsevier Inc.
PY - 2022/6
Y1 - 2022/6
N2 - Recently, the memory mechanism has been widely implemented in target tracking. However, these trackers hardly balance the stability of long-term memory with the plasticity of short-term memory through an elegant and efficient mechanism. A residual memory inference network (RMIT) is proposed to exploit the history of target states and last visual features. Specifically, RMIT consists of a base layer and a residual memory layer by synergizing short-and long-term memories. The base layer can be regarded as Discriminative Correlation Filter (DCF) reformulation that maintains the short-term memory to accommodate rapid appearance changes. The residual memory layer can extend residual learning from the spatial domain to the Spatio-temporal domain via ConvLSTM to obtain long-term memory of the target appearance. To avoid model degradation due to sample imbalance, we introduce a weighted gradient harmonized loss to improve the discrimination of the tracker. Then, response scores can be served as a basis of the adaptive learning strategy to ensure the reliability of memory updates. The proposed method performs favorably and has been extensively validated on six benchmark datasets, including OTB-50/100, TC-128, UAV-123, and VOT-2016/2018 against several advanced methods.
AB - Recently, the memory mechanism has been widely implemented in target tracking. However, these trackers hardly balance the stability of long-term memory with the plasticity of short-term memory through an elegant and efficient mechanism. A residual memory inference network (RMIT) is proposed to exploit the history of target states and last visual features. Specifically, RMIT consists of a base layer and a residual memory layer by synergizing short-and long-term memories. The base layer can be regarded as Discriminative Correlation Filter (DCF) reformulation that maintains the short-term memory to accommodate rapid appearance changes. The residual memory layer can extend residual learning from the spatial domain to the Spatio-temporal domain via ConvLSTM to obtain long-term memory of the target appearance. To avoid model degradation due to sample imbalance, we introduce a weighted gradient harmonized loss to improve the discrimination of the tracker. Then, response scores can be served as a basis of the adaptive learning strategy to ensure the reliability of memory updates. The proposed method performs favorably and has been extensively validated on six benchmark datasets, including OTB-50/100, TC-128, UAV-123, and VOT-2016/2018 against several advanced methods.
KW - Long-short term memory
KW - Residual network
KW - Visual tracking
UR - https://www.scopus.com/pages/publications/85126537045
U2 - 10.1016/j.ins.2022.03.047
DO - 10.1016/j.ins.2022.03.047
M3 - 文章
AN - SCOPUS:85126537045
SN - 0020-0255
VL - 597
SP - 105
EP - 124
JO - Information Sciences
JF - Information Sciences
ER -