TY - GEN
T1 - ClothPPO
T2 - 33rd International Joint Conference on Artificial Intelligence, IJCAI 2024
AU - Yang, Libing
AU - Li, Yang
AU - Chen, Long
N1 - Publisher Copyright:
© 2024 International Joint Conferences on Artificial Intelligence. All rights reserved.
PY - 2024
Y1 - 2024
N2 - Vision-based robotic cloth unfolding has made great progress recently. However, prior works predominantly rely on value learning and have not fully explored policy-based techniques. Recently, the success of reinforcement learning on the large language model has shown that the policy gradient algorithm can enhance policy with huge action space. In this paper, we introduce ClothPPO, a framework that employs a policy gradient algorithm based on actor-critic architecture to enhance a pre-trained model with huge 106 action spaces aligned with observation in the task of unfolding clothes. To this end, we redefine the cloth manipulation problem as a partially observable Markov decision process. A supervised pretraining stage is employed to train a baseline model of our policy. In the second stage, the Proximal Policy Optimization (PPO) is utilized to guide the supervised model within the observation-aligned action space. By optimizing and updating the strategy, our proposed method increases the garment's surface area for cloth unfolding under the soft-body manipulation task. Experimental results show that our proposed framework can further improve the unfolding performance of other state-of-the-art methods. Our project is available at https://vpxecnu.github.io/ClothPPO-website/.
AB - Vision-based robotic cloth unfolding has made great progress recently. However, prior works predominantly rely on value learning and have not fully explored policy-based techniques. Recently, the success of reinforcement learning on the large language model has shown that the policy gradient algorithm can enhance policy with huge action space. In this paper, we introduce ClothPPO, a framework that employs a policy gradient algorithm based on actor-critic architecture to enhance a pre-trained model with huge 106 action spaces aligned with observation in the task of unfolding clothes. To this end, we redefine the cloth manipulation problem as a partially observable Markov decision process. A supervised pretraining stage is employed to train a baseline model of our policy. In the second stage, the Proximal Policy Optimization (PPO) is utilized to guide the supervised model within the observation-aligned action space. By optimizing and updating the strategy, our proposed method increases the garment's surface area for cloth unfolding under the soft-body manipulation task. Experimental results show that our proposed framework can further improve the unfolding performance of other state-of-the-art methods. Our project is available at https://vpxecnu.github.io/ClothPPO-website/.
UR - https://www.scopus.com/pages/publications/85204309174
M3 - 会议稿件
AN - SCOPUS:85204309174
T3 - IJCAI International Joint Conference on Artificial Intelligence
SP - 6895
EP - 6903
BT - Proceedings of the 33rd International Joint Conference on Artificial Intelligence, IJCAI 2024
A2 - Larson, Kate
PB - International Joint Conferences on Artificial Intelligence
Y2 - 3 August 2024 through 9 August 2024
ER -