TY - JOUR
T1 - Active learning with effective scoring functions for semi-supervised temporal action localization
AU - Li, Ding
AU - Yang, Xuebing
AU - Tang, Yongqiang
AU - Zhang, Chenyang
AU - Zhang, Wensheng
AU - Ma, Lizhuang
N1 - Publisher Copyright:
© 2023 Elsevier B.V.
PY - 2023/7
Y1 - 2023/7
N2 - Temporal Action Localization (TAL) aims to predict both action category and temporal boundary of action instances in untrimmed videos, i.e., start and end time. Existing works usually adopt fully-supervised solutions, however, one of the practical bottlenecks in these solutions is the large amount of labeled training data required. To reduce expensive human label cost, this paper focuses on a rarely investigated yet practical task named semi-supervised TAL and proposes an effective active learning method, named AL-STAL. We leverage four steps for actively selecting video samples with high informativeness and training the localization model, named Train, Query, Annotate, Append. Two scoring functions that consider the uncertainty of localization model are equipped in AL-STAL, thus facilitating the video sample ranking and selection. One takes entropy of predicted label distribution as measure of uncertainty, named Temporal Proposal Entropy (TPE). And the other introduces a new metric based on mutual information between adjacent action proposals, named Temporal Context Inconsistency (TCI). To validate the effectiveness of proposed method, we conduct extensive experiments on three benchmark datasets THUMOS’14, ActivityNet 1.3 and ActivityNet 1.2. Experiment results show that AL-STAL outperforms the existing competitors and achieves satisfying performance compared with fully-supervised learning.
AB - Temporal Action Localization (TAL) aims to predict both action category and temporal boundary of action instances in untrimmed videos, i.e., start and end time. Existing works usually adopt fully-supervised solutions, however, one of the practical bottlenecks in these solutions is the large amount of labeled training data required. To reduce expensive human label cost, this paper focuses on a rarely investigated yet practical task named semi-supervised TAL and proposes an effective active learning method, named AL-STAL. We leverage four steps for actively selecting video samples with high informativeness and training the localization model, named Train, Query, Annotate, Append. Two scoring functions that consider the uncertainty of localization model are equipped in AL-STAL, thus facilitating the video sample ranking and selection. One takes entropy of predicted label distribution as measure of uncertainty, named Temporal Proposal Entropy (TPE). And the other introduces a new metric based on mutual information between adjacent action proposals, named Temporal Context Inconsistency (TCI). To validate the effectiveness of proposed method, we conduct extensive experiments on three benchmark datasets THUMOS’14, ActivityNet 1.3 and ActivityNet 1.2. Experiment results show that AL-STAL outperforms the existing competitors and achieves satisfying performance compared with fully-supervised learning.
KW - Active learning
KW - Scoring function
KW - Temporal action localization
UR - https://www.scopus.com/pages/publications/85151787079
U2 - 10.1016/j.displa.2023.102434
DO - 10.1016/j.displa.2023.102434
M3 - 文章
AN - SCOPUS:85151787079
SN - 0141-9382
VL - 78
JO - Displays
JF - Displays
M1 - 102434
ER -