TY - JOUR
T1 - FFFN
T2 - Frame-By-Frame Feedback Fusion Network for Video Super-Resolution
AU - Zhu, Jian
AU - Zhang, Qingwu
AU - Fei, Lunke
AU - Cai, Ruichu
AU - Xie, Yuan
AU - Sheng, Bin
AU - Yang, Xiaokang
N1 - Publisher Copyright:
© 2022 IEEE.
PY - 2023
Y1 - 2023
N2 - Video super-resolution (VSR) is a fundamental and challenging task in computer vision. Many of the existing VSR works focus on how to effectively align neighboring frames to better incorporate temporal information, while little work is devoted to the important subsequent step of inter-frame information fusion, and the existing methods on frame fusion have shortcomings such as not being able to make full use of spatio-temporal information. In this work, we propose a Frame-by-frame Feedback Fusion Network (FFFN) for VSR tasks. By applying the feedback learning mechanism commonly existing in the human cognitive system to the frame fusion stage, FFFN can refine low-level representation of the fused frames with high-level information in a coarse-to-fine manner. Specifically, after the neighboring frames are aligned, we first rearrange them from near to far according to the distance from the reference frame in the temporal space, and then feed them one-by-one into a proposed recurrent structure called Feedback Fusion Module (FFM), which is then able to iteratively generate high-level representation of the fused frames with several Feature Refinement Groups (FRGs) and feedback connections. Finally, we design a Dual-path Residual Reconstruction Module (DRRM) to reconstruct the final high-resolution image. The proposed FFFN comes with a strong frame fusion and reconstruction ability, and extensive experiments on several benchmark data sets show that it achieves favorable performance against state-of-the-art methods.
AB - Video super-resolution (VSR) is a fundamental and challenging task in computer vision. Many of the existing VSR works focus on how to effectively align neighboring frames to better incorporate temporal information, while little work is devoted to the important subsequent step of inter-frame information fusion, and the existing methods on frame fusion have shortcomings such as not being able to make full use of spatio-temporal information. In this work, we propose a Frame-by-frame Feedback Fusion Network (FFFN) for VSR tasks. By applying the feedback learning mechanism commonly existing in the human cognitive system to the frame fusion stage, FFFN can refine low-level representation of the fused frames with high-level information in a coarse-to-fine manner. Specifically, after the neighboring frames are aligned, we first rearrange them from near to far according to the distance from the reference frame in the temporal space, and then feed them one-by-one into a proposed recurrent structure called Feedback Fusion Module (FFM), which is then able to iteratively generate high-level representation of the fused frames with several Feature Refinement Groups (FRGs) and feedback connections. Finally, we design a Dual-path Residual Reconstruction Module (DRRM) to reconstruct the final high-resolution image. The proposed FFFN comes with a strong frame fusion and reconstruction ability, and extensive experiments on several benchmark data sets show that it achieves favorable performance against state-of-the-art methods.
KW - Dual-path reconstruction
KW - Feedback mechanism
KW - Frame fusion
KW - Video super-resolution
UR - https://www.scopus.com/pages/publications/85140791116
U2 - 10.1109/TMM.2022.3214776
DO - 10.1109/TMM.2022.3214776
M3 - 文章
AN - SCOPUS:85140791116
SN - 1520-9210
VL - 25
SP - 6821
EP - 6835
JO - IEEE Transactions on Multimedia
JF - IEEE Transactions on Multimedia
ER -