Lifting Deep Image Denoisers to Video with Frame Interpolation Pre-training | Zendy

Piotr Kopa Ostrowski | Zendy; Daniel Wesierski | Zendy; Anna Jezierska | Zendy; Tomasz Stefanski | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

Lifting Deep Image Denoisers to Video with Frame Interpolation Pre-training

Author(s) -

Piotr Kopa Ostrowski,

Daniel Wesierski,

Anna Jezierska,

Tomasz Stefanski

Publication year - 2025

Publication title -

ieee transactions on circuits and systems for video technology

Language(s) - English

Resource type - Magazines

SCImago Journal Rank - 0.873

H-Index - 168

eISSN - 1558-2205

pISSN - 1051-8215

DOI - 10.1109/tcsvt.2025.3575717

Subject(s) - components, circuits, devices and systems , communication, networking and broadcast technologies , computing and processing , signal processing and analysis

We introduce Frame Interpolation Pre-training (FIP), a simple learning technique for lifting deep image denoisers to video denoising with improved implicit temporal alignment. Modern video denoising networks typically rely on explicit motion estimation and alignment which are computationally intensive and harder to re-design and re-train, restricting their application scope and usability. Conversely, stacking frames and image denoisers, without incorporating explicit motion estimation modules, improves speed and benefits from a simpler design, thereby facilitating their generalizability to the video domain. However, it leads to lower accuracy due to suboptimal capture of temporal dependencies. To better leverage the adjacent frames in this setting and reduce the accuracy gap, we propose a novel training regime that divides the standard supervised training of the denoising task into two phases. In the initial phase, FIP guides the network to interpolate a fully masked central frame using only adjacent noisy input frames. In the subsequent phase, the pre-trained network is fine-tuned on denoising the central frame, now using all noisy input frames. Extensive diagnostics indicate that FIP-based networks provide better implicit motion estimation and temporal alignment. In effect, qualitative and quantitative evaluation on standard video denoising datasets with synthetic and real noise demonstrates that FIP consistently improves video denoising accuracy of motion-aware, video-lifted image denoisers without additional computational overhead during training and test time. Our code is available at https://github.com/camalab-ai/FIP.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research