Flow Guidance Deformable Compensation Network for Video Frame Interpolation

Research output: Contribution to journalArticlepeer-review

4 Scopus citations

Abstract

Flow-based and deformable convolution (DConv)-based methods are two mainstream approaches for solving the video frame interpolation (VFI) problem, which have made remarkable progress with the development of deep convolutional networks over the past years. However, flow-based VFI methods often suffer from the inaccuracy of flow map estimation, especially in dealing with complex and irregular real-world motions. DConv-based VFI methods have advantages in handling complex motions, while the increased degree of freedom makes the training of the DConv model difficult. To address these problems, in this article, we propose a flow guidance deformable compensation network (FGDCN) for the VFI task. FGDCN decomposes the frame sampling process into two steps: a flow step and a deformation step. Specifically, the flow step utilizes a coarse-to-fine flow estimation network to directly estimate the intermediate flows and synthesizes an anchor frame simultaneously. To ensure the accuracy of the estimated flow, a distillation loss and a task-oriented loss are jointly employed in this step. Under the guidance of the flow priors learned in step one, the deformation step designs a new pyramid deformable compensation network to compensate for the missing details of the flow step. In addition, a pyramid loss is proposed to supervise the model in both the image and frequency domains. Experimental results show that the proposed algorithm achieves excellent performance on various datasets with fewer parameters.

Original languageEnglish
Pages (from-to)1801-1812
Number of pages12
JournalIEEE Transactions on Multimedia
Volume26
DOIs
StatePublished - 2024

Keywords

  • Video frame interpolation
  • deformable convolution
  • distillation learning
  • motion compensation
  • motion estimation

Fingerprint

Dive into the research topics of 'Flow Guidance Deformable Compensation Network for Video Frame Interpolation'. Together they form a unique fingerprint.

Cite this