TY - JOUR
T1 - Deep Learning-based Moving Object Segmentation
T2 - Recent Progress and Research Prospects
AU - Jiang, Rui
AU - Zhu, Ruixiang
AU - Su, Hu
AU - Li, Yinlin
AU - Xie, Yuan
AU - Zou, Wei
N1 - Publisher Copyright:
© 2023, Institute of Automation, Chinese Academy of Sciences and Springer-Verlag GmbH Germany, part of Springer Nature.
PY - 2023/6
Y1 - 2023/6
N2 - Moving object segmentation (MOS), aiming at segmenting moving objects from video frames, is an important and challenging task in computer vision and with various applications. With the development of deep learning (DL), MOS has also entered the era of deep models toward spatiotemporal feature learning. This paper aims to provide the latest review of recent DL-based MOS methods proposed during the past three years. Specifically, we present a more up-to-date categorization based on model characteristics, then compare and discuss each category from feature learning (FL), and model training and evaluation perspectives. For FL, the methods reviewed are divided into three types: spatial FL, temporal FL, and spatiotemporal FL, then analyzed from input and model architectures aspects, three input types, and four typical preprocessing subnetworks are summarized. In terms of training, we discuss ideas for enhancing model transferability. In terms of evaluation, based on a previous categorization of scene dependent evaluation and scene independent evaluation, and combined with whether used videos are recorded with static or moving cameras, we further provide four subdivided evaluation setups and analyze that of reviewed methods. We also show performance comparisons of some reviewed MOS methods and analyze the advantages and disadvantages of reviewed MOS methods in terms of technology. Finally, based on the above comparisons and discussions, we present research prospects and future directions.
AB - Moving object segmentation (MOS), aiming at segmenting moving objects from video frames, is an important and challenging task in computer vision and with various applications. With the development of deep learning (DL), MOS has also entered the era of deep models toward spatiotemporal feature learning. This paper aims to provide the latest review of recent DL-based MOS methods proposed during the past three years. Specifically, we present a more up-to-date categorization based on model characteristics, then compare and discuss each category from feature learning (FL), and model training and evaluation perspectives. For FL, the methods reviewed are divided into three types: spatial FL, temporal FL, and spatiotemporal FL, then analyzed from input and model architectures aspects, three input types, and four typical preprocessing subnetworks are summarized. In terms of training, we discuss ideas for enhancing model transferability. In terms of evaluation, based on a previous categorization of scene dependent evaluation and scene independent evaluation, and combined with whether used videos are recorded with static or moving cameras, we further provide four subdivided evaluation setups and analyze that of reviewed methods. We also show performance comparisons of some reviewed MOS methods and analyze the advantages and disadvantages of reviewed MOS methods in terms of technology. Finally, based on the above comparisons and discussions, we present research prospects and future directions.
KW - Moving object segmentation (MOS)
KW - background subtraction
KW - change detection
KW - deep learning (DL)
KW - video understanding
UR - https://www.scopus.com/pages/publications/85153034634
U2 - 10.1007/s11633-022-1378-4
DO - 10.1007/s11633-022-1378-4
M3 - 文献综述
AN - SCOPUS:85153034634
SN - 2731-538X
VL - 20
SP - 335
EP - 369
JO - Machine Intelligence Research
JF - Machine Intelligence Research
IS - 3
ER -