TY - GEN
T1 - Three-Stage Temporal Deformable Network for Blurry Video Frame Interpolation
AU - Lei, Pengcheng
AU - Yan, Zaoming
AU - Wang, Tingting
AU - Fang, Faming
AU - Zhang, Guixu
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - Blurry video frame interpolation (BVFI) aims to generate high-frame-rate clear videos from low-frame-rate blurry videos, is a challenging but important topic in the computer vision community. Blurry videos not only provide spatial and temporal information like clear videos, but also contain additional motion information hidden in each blurry frame. However, existing BVFI methods usually fail to fully leverage all valuable information, which ultimately hinders their performance. In this paper, we propose a simple three-stage temporal deformable network to fully explore useful information from blurry videos. The frame interpolation stage designs a deformable network to directly sample useful information from blurry inputs and synthesize an intermediate frame at an arbitrary time interval. The temporal feature fusion stage explores the long-term temporal information for each target frame through a bi-directional recurrent deformable alignment network. And the deblurring stage applies a transformer-empowered Taylor approximation network to recursively recover the high-frequency details. Quantitative and qualitative results indicate that our model outperforms existing SOTA methods.
AB - Blurry video frame interpolation (BVFI) aims to generate high-frame-rate clear videos from low-frame-rate blurry videos, is a challenging but important topic in the computer vision community. Blurry videos not only provide spatial and temporal information like clear videos, but also contain additional motion information hidden in each blurry frame. However, existing BVFI methods usually fail to fully leverage all valuable information, which ultimately hinders their performance. In this paper, we propose a simple three-stage temporal deformable network to fully explore useful information from blurry videos. The frame interpolation stage designs a deformable network to directly sample useful information from blurry inputs and synthesize an intermediate frame at an arbitrary time interval. The temporal feature fusion stage explores the long-term temporal information for each target frame through a bi-directional recurrent deformable alignment network. And the deblurring stage applies a transformer-empowered Taylor approximation network to recursively recover the high-frequency details. Quantitative and qualitative results indicate that our model outperforms existing SOTA methods.
KW - Video frame interpolation
KW - deformable convolution
KW - video deblurring
KW - vision transformer
UR - https://www.scopus.com/pages/publications/85206582399
U2 - 10.1109/ICME57554.2024.10687742
DO - 10.1109/ICME57554.2024.10687742
M3 - 会议稿件
AN - SCOPUS:85206582399
T3 - Proceedings - IEEE International Conference on Multimedia and Expo
BT - 2024 IEEE International Conference on Multimedia and Expo, ICME 2024
PB - IEEE Computer Society
T2 - 2024 IEEE International Conference on Multimedia and Expo, ICME 2024
Y2 - 15 July 2024 through 19 July 2024
ER -