TY - JOUR
T1 - FuseFormer
T2 - A Manifold Metric Fusing Attention for Pedestrian Trajectory Prediction
AU - Zou, Yi
AU - Ko, Kohsin
AU - Yang, Jian
AU - Liu, Yingjie
AU - Li, Ke
AU - You, Xiong
AU - Mi, Jinpeng
AU - Tang, Xuan
AU - Chen, Mingsong
AU - Wei, Xian
N1 - Publisher Copyright:
© IEEE. All rights reserved, including rights for text and data mining, and training of artificial intelligence and similar technologies.
PY - 2025
Y1 - 2025
N2 - Accurate pedestrian trajectory prediction is critical for ensuring the safety of autonomous vehicles and advancing higher levels of driving automation. However, the complex interpersonal interactions and highly dynamic trajectory patterns in real-world scenarios pose significant challenges to achieving precise predictions. Recently, Transformers have shown remarkable success in pedestrian trajectory prediction, primarily due to their effective modeling of temporal and spatial dependencies via Multi-Head Self-Attention (MHA) mechanisms. Despite these advancements, existing self-attention methods often rely on Euclidean distance-based metrics and dot-product operations, which are inadequate for capturing interaction-induced trajectory curvatures. To address this limitation, we propose a novel hybrid Transformer architecture, FuseFormer, that incorporates Geodesic Self-Attention (GSA) mechanisms. GSA utilizes geodesic distances to characterize interaction features effectively, complementing MHA, which excels in capturing local features and maintaining temporal correlations. FuseFormer employs a gating network to adaptively combine GSA and MHA embeddings, leveraging their complementary strengths. Additionally, FuseFormer integrates a Transformer-based Neural Ordinary Differential Equation (ODE) decoder to model trajectory temporal dynamics. This design enables the generation of future trajectories that align closely with motion trends while adapting the network depth to input sequence lengths. Experimental results demonstrate that FuseFormer achieves state-of-the-art performance across widely used pedestrian trajectory prediction datasets, including ETH/UCY, SDD, and NBA. These results underscore the model's effectiveness and generalization capability in capturing complex interaction patterns and handling diverse scenarios.
AB - Accurate pedestrian trajectory prediction is critical for ensuring the safety of autonomous vehicles and advancing higher levels of driving automation. However, the complex interpersonal interactions and highly dynamic trajectory patterns in real-world scenarios pose significant challenges to achieving precise predictions. Recently, Transformers have shown remarkable success in pedestrian trajectory prediction, primarily due to their effective modeling of temporal and spatial dependencies via Multi-Head Self-Attention (MHA) mechanisms. Despite these advancements, existing self-attention methods often rely on Euclidean distance-based metrics and dot-product operations, which are inadequate for capturing interaction-induced trajectory curvatures. To address this limitation, we propose a novel hybrid Transformer architecture, FuseFormer, that incorporates Geodesic Self-Attention (GSA) mechanisms. GSA utilizes geodesic distances to characterize interaction features effectively, complementing MHA, which excels in capturing local features and maintaining temporal correlations. FuseFormer employs a gating network to adaptively combine GSA and MHA embeddings, leveraging their complementary strengths. Additionally, FuseFormer integrates a Transformer-based Neural Ordinary Differential Equation (ODE) decoder to model trajectory temporal dynamics. This design enables the generation of future trajectories that align closely with motion trends while adapting the network depth to input sequence lengths. Experimental results demonstrate that FuseFormer achieves state-of-the-art performance across widely used pedestrian trajectory prediction datasets, including ETH/UCY, SDD, and NBA. These results underscore the model's effectiveness and generalization capability in capturing complex interaction patterns and handling diverse scenarios.
KW - Pedestrian trajectory prediction
KW - geodesic distance
KW - manifold
KW - neural ODE
KW - non-Euclidean
UR - https://www.scopus.com/pages/publications/105006737043
U2 - 10.1109/TITS.2025.3565990
DO - 10.1109/TITS.2025.3565990
M3 - 文章
AN - SCOPUS:105006737043
SN - 1524-9050
VL - 26
SP - 12372
EP - 12386
JO - IEEE Transactions on Intelligent Transportation Systems
JF - IEEE Transactions on Intelligent Transportation Systems
IS - 8
ER -