TY - GEN
T1 - Delving into the Local
T2 - 36th AAAI Conference on Artificial Intelligence, AAAI 2022
AU - Gu, Zhihao
AU - Chen, Yang
AU - Yao, Taiping
AU - Ding, Shouhong
AU - Li, Jilin
AU - Ma, Lizhuang
N1 - Publisher Copyright:
Copyright © 2022, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.
PY - 2022/6/30
Y1 - 2022/6/30
N2 - The rapid development of facial manipulation techniques has aroused public concerns in recent years. Existing deepfake video detection approaches attempt to capture the discriminative features between real and fake faces based on temporal modelling. However, these works impose supervisions on sparsely sampled video frames but overlook the local motions among adjacent frames, which instead encode rich inconsistency information that can serve as an efficient indicator for DeepFake video detection. To mitigate this issue, we delves into the local motion and propose a novel sampling unit named snippet which contains a few successive videos frames for local temporal inconsistency learning. Moreover, we elaborately design an Intra-Snippet Inconsistency Module (Intra-SIM) and an Inter-Snippet Interaction Module (Inter-SIM) to establish a dynamic inconsistency modelling framework. Specifically, the Intra-SIM applies bi-directional temporal difference operations and a learnable convolution kernel to mine the short-term motions within each snippet. The Inter-SIM is then devised to promote the cross-snippet information interaction to form global representations. The Intra-SIM and Inter-SIM work in an alternate manner and can be plugged into existing 2D CNNs. Our method outperforms the state of the art competitors on four popular benchmark dataset, i.e., FaceForensics++, Celeb-DF, DFDC and WildDeepfake. Besides, extensive experiments and visualizations are also presented to further illustrate its effectiveness.
AB - The rapid development of facial manipulation techniques has aroused public concerns in recent years. Existing deepfake video detection approaches attempt to capture the discriminative features between real and fake faces based on temporal modelling. However, these works impose supervisions on sparsely sampled video frames but overlook the local motions among adjacent frames, which instead encode rich inconsistency information that can serve as an efficient indicator for DeepFake video detection. To mitigate this issue, we delves into the local motion and propose a novel sampling unit named snippet which contains a few successive videos frames for local temporal inconsistency learning. Moreover, we elaborately design an Intra-Snippet Inconsistency Module (Intra-SIM) and an Inter-Snippet Interaction Module (Inter-SIM) to establish a dynamic inconsistency modelling framework. Specifically, the Intra-SIM applies bi-directional temporal difference operations and a learnable convolution kernel to mine the short-term motions within each snippet. The Inter-SIM is then devised to promote the cross-snippet information interaction to form global representations. The Intra-SIM and Inter-SIM work in an alternate manner and can be plugged into existing 2D CNNs. Our method outperforms the state of the art competitors on four popular benchmark dataset, i.e., FaceForensics++, Celeb-DF, DFDC and WildDeepfake. Besides, extensive experiments and visualizations are also presented to further illustrate its effectiveness.
UR - https://www.scopus.com/pages/publications/85128015537
U2 - 10.1609/aaai.v36i1.19955
DO - 10.1609/aaai.v36i1.19955
M3 - 会议稿件
AN - SCOPUS:85128015537
T3 - Proceedings of the 36th AAAI Conference on Artificial Intelligence, AAAI 2022
SP - 1069
EP - 1077
BT - AAAI-22 Technical Tracks 1
PB - Association for the Advancement of Artificial Intelligence
Y2 - 22 February 2022 through 1 March 2022
ER -