TY - GEN
T1 - Hierarchical Walking Transformer for Object Re-Identification
AU - Tian, Xudong
AU - Liu, Jun
AU - Zhang, Zhizhong
AU - Wang, Chengjie
AU - Qu, Yanyun
AU - Xie, Yuan
AU - Ma, Lizhuang
N1 - Publisher Copyright:
© 2022 ACM.
PY - 2022/10/10
Y1 - 2022/10/10
N2 - Recently, transformer purely based on attention mechanism has been applied to a wide range of tasks and achieved impressive performance. Though extensive efforts have been made, there are still drawbacks to the transformer architecture which hinder its further applications: (i) the quadratic complexity brought by attention mechanism; (ii) barely incorporated inductive bias. In this paper, we present a new hierarchical walking attention, which provides a scalable, flexible, and interpretable sparsification strategy to reduce the complexity from quadratic to linear, and meanwhile evidently boost the performance. Specifically, we learn a hierarchical structure by splitting an image with different receptive fields. We associate each high-level region with a supernode, and inject supervision with prior knowledge in this node. Supernode then acts as an indicator to decide whether this area should be skipped and thereby massive unnecessary dot-product terms in attention can be avoided. Two sparsification phases are finally introduced, allowing the transformer to achieve strictly linear complexity. Extensive experiments are conducted to demonstrate the superior performance and efficiency against state-of-The-Art methods. Significantly, our method sharply reduces the inference time and the total of tokens by 28% and $94%$ respectively, and brings 2.6%@Rank-1 promotion on MSMT17.
AB - Recently, transformer purely based on attention mechanism has been applied to a wide range of tasks and achieved impressive performance. Though extensive efforts have been made, there are still drawbacks to the transformer architecture which hinder its further applications: (i) the quadratic complexity brought by attention mechanism; (ii) barely incorporated inductive bias. In this paper, we present a new hierarchical walking attention, which provides a scalable, flexible, and interpretable sparsification strategy to reduce the complexity from quadratic to linear, and meanwhile evidently boost the performance. Specifically, we learn a hierarchical structure by splitting an image with different receptive fields. We associate each high-level region with a supernode, and inject supervision with prior knowledge in this node. Supernode then acts as an indicator to decide whether this area should be skipped and thereby massive unnecessary dot-product terms in attention can be avoided. Two sparsification phases are finally introduced, allowing the transformer to achieve strictly linear complexity. Extensive experiments are conducted to demonstrate the superior performance and efficiency against state-of-The-Art methods. Significantly, our method sharply reduces the inference time and the total of tokens by 28% and $94%$ respectively, and brings 2.6%@Rank-1 promotion on MSMT17.
KW - hierarchical features
KW - linear transformer
KW - random walking
UR - https://www.scopus.com/pages/publications/85151144298
U2 - 10.1145/3503161.3548401
DO - 10.1145/3503161.3548401
M3 - 会议稿件
AN - SCOPUS:85151144298
T3 - MM 2022 - Proceedings of the 30th ACM International Conference on Multimedia
SP - 4224
EP - 4232
BT - MM 2022 - Proceedings of the 30th ACM International Conference on Multimedia
PB - Association for Computing Machinery, Inc
T2 - 30th ACM International Conference on Multimedia, MM 2022
Y2 - 10 October 2022 through 14 October 2022
ER -