
Deep Residual Transform for Multi-scale Image Decomposition
Author(s) -
Yuhao Chen,
Alexander Wong,
Yuan Fang,
Yifan Wu,
Linlin Xu
Publication year - 2021
Publication title -
journal of computational vision and imaging systems
Language(s) - English
Resource type - Journals
ISSN - 2562-0444
DOI - 10.15353/jcvis.v6i1.3537
Subject(s) - computer science , residual , artificial intelligence , leverage (statistics) , granularity , transformation (genetics) , representation (politics) , pattern recognition (psychology) , hierarchy , image (mathematics) , decomposition , computer vision , algorithm , ecology , biochemistry , chemistry , politics , biology , political science , economics , law , market economy , gene , operating system
Multi-scale image decomposition (MID) is a fundamental task in computer vision and image processing that involves the transformation of an image into a hierarchical representation comprising of different levels of visual granularity from coarse structures to fine details. A well-engineered MID disentangles the image signal into meaningful components which can be used in a variety of applications such as image denoising, image compression, and object classification. Traditional MID approaches such as wavelet transforms tackle the problem through carefully designed basis functions under rigid decomposition structure assumptions. However, as the information distribution varies from one type of image content to another, rigid decomposition assumptions lead to inefficiently representation, i.e., some scales can contain little to no information. To address this issue, we present Deep Residual Transform (DRT), a data-driven MID strategy where the input signal is transformed into a hierarchy of non-linear representations at different scales, with each representation being independently learned as the representational residual of previous scales at a user-controlled detail level. As such, the proposed DRT progressively disentangles scale information from the original signal by sequentially learning residual representations. The decomposition flexibility of this approach allows for highly tailored representations cater to specific types of image content, and results in greater representational efficiency and compactness. In this study, we realize the proposed transform by leveraging a hierarchy of sequentially trained autoencoders. To explore the efficacy of the proposed DRT, we leverage two datasets comprising of very different types of image content: 1) CelebFaces and 2) Cityscapes. Experimental results show that the proposed DRT achieved highly efficient information decomposition on both datasets amid their very different visual granularity characteristics.