TY - JOUR
T1 - Enhancing parcel singulation efficiency through transformer-based position attention and state space augmentation
AU - Shen, Jiwei
AU - Lu, Hu
AU - Lyu, Shujing
AU - Lu, Yue
N1 - Publisher Copyright:
© 2024
PY - 2024/8/15
Y1 - 2024/8/15
N2 - Parcel singulation has emerged as a critical bottleneck in the swiftly advancing logistics processes. In the pursuit of a balance between cost-effectiveness and singulation efficiency, an automated parcel singulator utilizing a sparse actuator array is widely acknowledged as the optimal solution to address this challenge. However, its successful operation necessitates the implementation of a sophisticated control policy. In this work, we tackle the problem of parcel singulation by formulating it as a Markov Decision Process (MDP) within a variable state space dimension. Traditional Deep Reinforcement Learning struggles with variable state dimensions and task-specific priority learning, necessitating adaptable state representations and the implementation of advanced learning algorithms. In this study, we introduce a novel DRL algorithm, designated as Transformer-based Position Attention and State Space Augmentation Soft Actor–Critic (TPASSA-SAC). This algorithm incorporates Transformer-based attention mechanisms, specifically tailored to prioritize processing of parcels based on their spatial positions. Moreover, the unique aspect of TPASSA-SAC lies in its capability to enhance Q-value estimations by employing State Space Augmentation. This approach not only refines the decision-making process but also contributes to a more robust and accurate learning paradigm. Furthermore, we have developed a simulation environment that is grounded in real-world data distributions specific to parcel singulation. Our experimental findings conclusively establish the enhanced performance of our proposed TPASSA-SAC, distinguishing it from existing DRL-based models and conventional singulation techniques. TPASSA-SAC demonstrates superior efficacy, evidenced by the highest parcel pass rates observed (ranging from 99.62% to 99.96%) and unparalleled throughput efficiency, processing in excess of 5036 parcels per hour across a variety of scenarios.
AB - Parcel singulation has emerged as a critical bottleneck in the swiftly advancing logistics processes. In the pursuit of a balance between cost-effectiveness and singulation efficiency, an automated parcel singulator utilizing a sparse actuator array is widely acknowledged as the optimal solution to address this challenge. However, its successful operation necessitates the implementation of a sophisticated control policy. In this work, we tackle the problem of parcel singulation by formulating it as a Markov Decision Process (MDP) within a variable state space dimension. Traditional Deep Reinforcement Learning struggles with variable state dimensions and task-specific priority learning, necessitating adaptable state representations and the implementation of advanced learning algorithms. In this study, we introduce a novel DRL algorithm, designated as Transformer-based Position Attention and State Space Augmentation Soft Actor–Critic (TPASSA-SAC). This algorithm incorporates Transformer-based attention mechanisms, specifically tailored to prioritize processing of parcels based on their spatial positions. Moreover, the unique aspect of TPASSA-SAC lies in its capability to enhance Q-value estimations by employing State Space Augmentation. This approach not only refines the decision-making process but also contributes to a more robust and accurate learning paradigm. Furthermore, we have developed a simulation environment that is grounded in real-world data distributions specific to parcel singulation. Our experimental findings conclusively establish the enhanced performance of our proposed TPASSA-SAC, distinguishing it from existing DRL-based models and conventional singulation techniques. TPASSA-SAC demonstrates superior efficacy, evidenced by the highest parcel pass rates observed (ranging from 99.62% to 99.96%) and unparalleled throughput efficiency, processing in excess of 5036 parcels per hour across a variety of scenarios.
KW - Deep reinforcement learning
KW - Non-stationary environment
KW - Parcel singulation
KW - Soft actor–critic
KW - State space augmentation
UR - https://www.scopus.com/pages/publications/85184484906
U2 - 10.1016/j.eswa.2024.123393
DO - 10.1016/j.eswa.2024.123393
M3 - 文章
AN - SCOPUS:85184484906
SN - 0957-4174
VL - 248
JO - Expert Systems with Applications
JF - Expert Systems with Applications
M1 - 123393
ER -