TY - GEN
T1 - Sequential Viewpoint Selection and Grasping with Partial Observability Reinforcement Learning
AU - Chen, Weiwen
AU - Hua, Yun
AU - Jin, Bo
AU - Zhu, Jun
AU - Ge, Quanbo
AU - Wang, Xiangfeng
N1 - Publisher Copyright:
© 2022 IEEE.
PY - 2022
Y1 - 2022
N2 - Despite the success of vision-based object grasping due to deep learning development, fixed-view object grasping methods still face information loss with limited performance. Recently some rule-based or heuristic-based methods have begun to sequentially consider multiple views to improve the perceptibility of the environment, which shows better performance. However, their sequence lengths are too short, or their viewpoint selection is myopic and ignores the long-term effect. This paper models sequential viewpoints selection as a Markov Decision Process. The Sequential Decided Multi-View Grasping (SDMVG) method is proposed based on reinforcement learning, and an RNN-based policy is introduced. Considering long-term return, SDMVG can generate viewpoints sequence which achieves most information gain. Numerical experiments show SDMVG can achieve 10% accuracy improvement compared with rule-or heuristic-based baselines on Multi-View GraspNet Benchmark. Moreover, SDMVG approaches the global optimum with only 1/40 wall time compared with the brute-force method.
AB - Despite the success of vision-based object grasping due to deep learning development, fixed-view object grasping methods still face information loss with limited performance. Recently some rule-based or heuristic-based methods have begun to sequentially consider multiple views to improve the perceptibility of the environment, which shows better performance. However, their sequence lengths are too short, or their viewpoint selection is myopic and ignores the long-term effect. This paper models sequential viewpoints selection as a Markov Decision Process. The Sequential Decided Multi-View Grasping (SDMVG) method is proposed based on reinforcement learning, and an RNN-based policy is introduced. Considering long-term return, SDMVG can generate viewpoints sequence which achieves most information gain. Numerical experiments show SDMVG can achieve 10% accuracy improvement compared with rule-or heuristic-based baselines on Multi-View GraspNet Benchmark. Moreover, SDMVG approaches the global optimum with only 1/40 wall time compared with the brute-force method.
KW - Object Grasping
KW - Reinforcement
KW - Robotic
UR - https://www.scopus.com/pages/publications/85147968333
U2 - 10.1109/YAC57282.2022.10023914
DO - 10.1109/YAC57282.2022.10023914
M3 - 会议稿件
AN - SCOPUS:85147968333
T3 - Proceedings - 2022 37th Youth Academic Annual Conference of Chinese Association of Automation, YAC 2022
SP - 1125
EP - 1129
BT - Proceedings - 2022 37th Youth Academic Annual Conference of Chinese Association of Automation, YAC 2022
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 37th Youth Academic Annual Conference of Chinese Association of Automation, YAC 2022
Y2 - 19 November 2022 through 20 November 2022
ER -