TY - JOUR
T1 - RIVIE
T2 - Robust Inherent Video Information Embedding
AU - Jia, Jun
AU - Gao, Zhongpai
AU - Zhu, Dandan
AU - Min, Xiongkuo
AU - Hu, Menghan
AU - Zhai, Guangtao
N1 - Publisher Copyright:
© 2022 IEEE.
PY - 2023
Y1 - 2023
N2 - Imagine an interesting situation when watching a movie, we can scan the screen using our smartphones to get some extra information about this movie such as the cast, the release date, the movie's homepage, etc. Our prospect is a world where each video contains invisible information that can be delivered to us through mobile devices with cameras. This paper proposes the first deep learning-based information hiding method for videos to achieve information transmission from screens to cameras. Compared with hiding information in single images, the methods for videos need to maintain visual quality in both spatial and temporal domains. Furthermore, the training of video models builds on a large video dataset, which needs much more computational resources than training models for images. To reduce the computational complexity, we propose to simulate data on-the-fly to generate simulated sequences from single images. Then, we use the simulated data to train a spatio-temporal generator that hides information in videos while maintaining visual quality. During training, a temporal loss function based on the simulated data is exploited to ensure the temporal consistency of generated videos. After embedding, we use a decoder to recover the hidden information. To simulate the imaging pipeline from screens to cameras in the real world, we insert a distortion network between the generator and decoder. The distortion network is based on differentiable 3D rendering to cover possible distortions introduced in the procedure of camera imaging. Experimental results show that the hidden information in videos can be extracted by cameras without impacting the visual quality. Our work can be applied to many fields, such as advertisement, entertainment, and education.
AB - Imagine an interesting situation when watching a movie, we can scan the screen using our smartphones to get some extra information about this movie such as the cast, the release date, the movie's homepage, etc. Our prospect is a world where each video contains invisible information that can be delivered to us through mobile devices with cameras. This paper proposes the first deep learning-based information hiding method for videos to achieve information transmission from screens to cameras. Compared with hiding information in single images, the methods for videos need to maintain visual quality in both spatial and temporal domains. Furthermore, the training of video models builds on a large video dataset, which needs much more computational resources than training models for images. To reduce the computational complexity, we propose to simulate data on-the-fly to generate simulated sequences from single images. Then, we use the simulated data to train a spatio-temporal generator that hides information in videos while maintaining visual quality. During training, a temporal loss function based on the simulated data is exploited to ensure the temporal consistency of generated videos. After embedding, we use a decoder to recover the hidden information. To simulate the imaging pipeline from screens to cameras in the real world, we insert a distortion network between the generator and decoder. The distortion network is based on differentiable 3D rendering to cover possible distortions introduced in the procedure of camera imaging. Experimental results show that the hidden information in videos can be extracted by cameras without impacting the visual quality. Our work can be applied to many fields, such as advertisement, entertainment, and education.
KW - 3D rendering
KW - Data hiding
KW - adversarial training
KW - display-camera communication
KW - temporal consistency
UR - https://www.scopus.com/pages/publications/85142810583
U2 - 10.1109/TMM.2022.3221894
DO - 10.1109/TMM.2022.3221894
M3 - 文章
AN - SCOPUS:85142810583
SN - 1520-9210
VL - 25
SP - 7364
EP - 7377
JO - IEEE Transactions on Multimedia
JF - IEEE Transactions on Multimedia
ER -