TY - JOUR
T1 - Leveraging Predictions of Task-Related Latents for Interactive Visual Navigation
AU - Shen, Jiwei
AU - Yuan, Liang
AU - Lu, Yue
AU - Lyu, Shujing
N1 - Publisher Copyright:
© 2012 IEEE.
PY - 2025
Y1 - 2025
N2 - Interactive visual navigation (IVN) involves tasks where embodied agents learn to interact with the objects in the environment to reach the goals. Current approaches exploit visual features to train a reinforcement learning (RL) navigation control policy network. However, RL-based methods continue to struggle at the IVN tasks as they are inefficient in learning a good representation of the unknown environment in partially observable settings. In this work, we introduce predictions of task-related latents (PTRLs), a flexible self-supervised RL framework for IVN tasks. PTRL learns the latent structured information about environment dynamics and leverages multistep representations of the sequential observations. Specifically, PTRL trains its representation by explicitly predicting the next pose of the agent conditioned on the actions. Moreover, an attention and memory module is employed to associate the learned representation to each action and exploit spatiotemporal dependencies. Furthermore, a state value boost module is introduced to adapt the model to previously unseen environments by leveraging input perturbations and regularizing the value function. Sample efficiency in the training of RL networks is enhanced by modular training and hierarchical decomposition. Extensive evaluations have proved the superiority of the proposed method in increasing the accuracy and generalization capacity.
AB - Interactive visual navigation (IVN) involves tasks where embodied agents learn to interact with the objects in the environment to reach the goals. Current approaches exploit visual features to train a reinforcement learning (RL) navigation control policy network. However, RL-based methods continue to struggle at the IVN tasks as they are inefficient in learning a good representation of the unknown environment in partially observable settings. In this work, we introduce predictions of task-related latents (PTRLs), a flexible self-supervised RL framework for IVN tasks. PTRL learns the latent structured information about environment dynamics and leverages multistep representations of the sequential observations. Specifically, PTRL trains its representation by explicitly predicting the next pose of the agent conditioned on the actions. Moreover, an attention and memory module is employed to associate the learned representation to each action and exploit spatiotemporal dependencies. Furthermore, a state value boost module is introduced to adapt the model to previously unseen environments by leveraging input perturbations and regularizing the value function. Sample efficiency in the training of RL networks is enhanced by modular training and hierarchical decomposition. Extensive evaluations have proved the superiority of the proposed method in increasing the accuracy and generalization capacity.
KW - Embodied artificial intelligence (AI)
KW - interactive visual navigation (IVN)
KW - reinforcement learning (RL)
KW - representation learning
KW - self-attention
UR - https://www.scopus.com/pages/publications/85184498430
U2 - 10.1109/TNNLS.2023.3335416
DO - 10.1109/TNNLS.2023.3335416
M3 - 文章
C2 - 38039173
AN - SCOPUS:85184498430
SN - 2162-237X
VL - 36
SP - 704
EP - 717
JO - IEEE Transactions on Neural Networks and Learning Systems
JF - IEEE Transactions on Neural Networks and Learning Systems
IS - 1
ER -