TY - GEN
T1 - Ego-Deliver
T2 - 29th ACM International Conference on Multimedia, MM 2021
AU - Qiu, Haonan
AU - He, Pan
AU - Liu, Shuchun
AU - Shao, Weiyuan
AU - Zhang, Feiyun
AU - Wang, Jiajun
AU - He, Liang
AU - Wang, Feng
N1 - Publisher Copyright:
© 2021 ACM.
PY - 2021/10/17
Y1 - 2021/10/17
N2 - The egocentric video provides a unique view of event participants to show their attention, vision, and interaction with objects. In this paper, we introduce Ego-Deliver, a new large-scale egocentric video benchmark recorded by takeaway riders about their daily work. To the best of our knowledge, Ego-Deliver presents the first attempt in understanding activities from the takeaway delivery process while being one of the largest egocentric video action datasets to date. Our dataset provides a total of 5,360 videos with more than 139,000 multi-track annotations and 45 different attributes, which we believe is pivotal to future research in this area. We introduce the FS-Net architecture, a new anchor-free action detection approach handling extreme variations of action durations. We partition videos into fragments and build dynamic graphs over fragments, where multi-fragment context information is aggregated to boost fragment classification. A splicing and scoring module is applied to obtain final action proposals. Our experimental evaluation confirms that the proposed framework outperforms existing approaches on the proposed Ego-Deliver benchmark and is competitive on other popular benchmarks. In our current version, Ego-Deliver is used to make a comprehensive comparison between algorithms for activity detection. We also show its application to action recognition with promising results. The dataset, toolkits and baseline results will be made available at: https://egodeliver.github.io/EgoDeliver_Dataset/
AB - The egocentric video provides a unique view of event participants to show their attention, vision, and interaction with objects. In this paper, we introduce Ego-Deliver, a new large-scale egocentric video benchmark recorded by takeaway riders about their daily work. To the best of our knowledge, Ego-Deliver presents the first attempt in understanding activities from the takeaway delivery process while being one of the largest egocentric video action datasets to date. Our dataset provides a total of 5,360 videos with more than 139,000 multi-track annotations and 45 different attributes, which we believe is pivotal to future research in this area. We introduce the FS-Net architecture, a new anchor-free action detection approach handling extreme variations of action durations. We partition videos into fragments and build dynamic graphs over fragments, where multi-fragment context information is aggregated to boost fragment classification. A splicing and scoring module is applied to obtain final action proposals. Our experimental evaluation confirms that the proposed framework outperforms existing approaches on the proposed Ego-Deliver benchmark and is competitive on other popular benchmarks. In our current version, Ego-Deliver is used to make a comprehensive comparison between algorithms for activity detection. We also show its application to action recognition with promising results. The dataset, toolkits and baseline results will be made available at: https://egodeliver.github.io/EgoDeliver_Dataset/
KW - egocentric vision
KW - food delivery
KW - video action localization
UR - https://www.scopus.com/pages/publications/85119333004
U2 - 10.1145/3474085.3475336
DO - 10.1145/3474085.3475336
M3 - 会议稿件
AN - SCOPUS:85119333004
T3 - MM 2021 - Proceedings of the 29th ACM International Conference on Multimedia
SP - 1847
EP - 1855
BT - MM 2021 - Proceedings of the 29th ACM International Conference on Multimedia
PB - Association for Computing Machinery, Inc
Y2 - 20 October 2021 through 24 October 2021
ER -