z-logo
open-access-imgOpen Access
Dual-Branch Neural Network-Based In-Loop Filter for VVC Intra Coding Using Spatial-Frequency Feature Fusion
Author(s) -
Zhen Feng,
Xu Liu,
Cheolkon Jung
Publication year - 2025
Publication title -
ieee access
Language(s) - English
Resource type - Magazines
SCImago Journal Rank - 0.587
H-Index - 127
eISSN - 2169-3536
DOI - 10.1109/access.2025.3573831
Subject(s) - aerospace , bioengineering , communication, networking and broadcast technologies , components, circuits, devices and systems , computing and processing , engineered materials, dielectrics and plasmas , engineering profession , fields, waves and electromagnetics , general topics for engineers , geoscience , nuclear engineering , photonics and electrooptics , power, energy and industry applications , robotics and control systems , signal processing and analysis , transportation
In company with the reconstructed frame, the quantization parameter (QP) map, the predicted frame, and the partition frame are popularly used as auxiliary input for the neural network-based in-loop filter (NNLF). The QP map provides quantization information for the reconstructed frame, while the predicted frame and partition frame represent compression artifacts by VTM. Since each has different types of information, directly concatenating all these inputs for the network may cause mutual interference between the input data and decrease the model performance. In this paper, we propose a dual-branch NNLF for VVC intra coding using spatial-frequency feature fusion. We design a dual-branch network for NNLF according to the input type: One branch includes reconstructed frames and QP maps, while the other branch includes reconstructed frame, predicted frame, and partition frame. The dual-branch architecture processes quantization information from the QP map and compression artifacts from the predicted frame and partition frame separately. Thus, the dual-branch architecture processes the input data efficiently according to its unique characteristics, while reducing mutual interference. Moreover, we adopt fast Fourier transform (FFT) to capture global context in a frame, instead of Transformer that has relatively high complexity. The spatial-frequency feature fusion combines features in spatial and frequency domains, which enhances feature representation capability and learns local and long-range feature correlations. Furthermore, we provide patch size-considered incremental learning based on QP distance that combines patch size and QP distance for network training. The training strategy enforces the proposed network to extend receptive field and learn both local features and global structure in a frame, thus enhancing its model adaptability. Experimental results show that the proposed NNLF achieves average BD-rate savings of {8.55% (Y), 20.48% (U), 21.44% (V)} over VTM-11.0_NNVC-3.0 in All Intra (AI) configuration.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here