Multi-scale Contextual Coding for Human-Machine Vision of Volumetric Medical Images | Zendy

Jietao Chen | Zendy; Weijie Chen | Zendy; Qianjian Xing | Zendy; Feng Yu | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

Multi-scale Contextual Coding for Human-Machine Vision of Volumetric Medical Images

Author(s) -

Jietao Chen,

Weijie Chen,

Qianjian Xing,

Feng Yu

Publication year - 2025

Publication title -

ieee access

Language(s) - English

Resource type - Magazines

SCImago Journal Rank - 0.587

H-Index - 127

eISSN - 2169-3536

DOI - 10.1109/access.2025.3597008

Subject(s) - aerospace , bioengineering , communication, networking and broadcast technologies , components, circuits, devices and systems , computing and processing , engineered materials, dielectrics and plasmas , engineering profession , fields, waves and electromagnetics , general topics for engineers , geoscience , nuclear engineering , photonics and electrooptics , power, energy and industry applications , robotics and control systems , signal processing and analysis , transportation

In recent years, the continuous advancement of digital technologies such as telemedicine and medical cloud computing has promoted collaborative research and diagnosis across multiple medical centers. However, the timely remote transmission and analysis of large volumetric medical images still pose significant challenges. While classical methods predominantly employing lossless compression are increasingly constrained by the limits of compression ratios, lossy 3D medical image compression methods are emerging as a promising alternative. Different from the existing 3D convolutional compression algorithms oriented only for human vision, this paper proposes a Multi-scale Contextual Autoencoder (MCAE) architecture that recurrently incorporates anatomical inter-slice context to optimize the compression of the current slice for both human and machine vision. Our decoded intermediate features, with sufficiently preserved semantic information, enable high-quality visualization and allow downstream machine vision tasks (e.g., segmentation and classification) to be performed directly without pixel-level recovery. To reduce the compression bit cost, we create a Multi-Dimensional Entropy Model that integrates inter-slice latent context with spatial-channel context and hierarchical hypercontext. Experimental results demonstrate that our framework obtains an average 9% BD-Rate reduction over the Versatile Video Coding (VVC) anchor on MRNet datasets, while achieving superior recognition performance for downstream segmentation and classification tasks than inputting reconstructed lossy images.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research