z-logo
open-access-imgOpen Access
Medical Report Generation with Knowledge Distillation and Multi-Stage Hierarchical Attention in Vision Transformer encoder and GPT-2 decoder
Author(s) -
Hilya Tsaniya,
Chastine Fatichah,
Nanik Suciati,
Takashi Obi,
Joong-sun Lee
Publication year - 2025
Publication title -
ieee access
Language(s) - English
Resource type - Magazines
SCImago Journal Rank - 0.587
H-Index - 127
eISSN - 2169-3536
DOI - 10.1109/access.2025.3588344
Subject(s) - aerospace , bioengineering , communication, networking and broadcast technologies , components, circuits, devices and systems , computing and processing , engineered materials, dielectrics and plasmas , engineering profession , fields, waves and electromagnetics , general topics for engineers , geoscience , nuclear engineering , photonics and electrooptics , power, energy and industry applications , robotics and control systems , signal processing and analysis , transportation
Automated medical report generation is a challenging task that involves synthesizing diagnostic findings and clinical observations from medical images. In this study, we propose a novel framework that integrates knowledge distillation and multi-stage hierarchical attention mechanisms to enhance the generation of comprehensive and accurate medical reports. Our approach leverages knowledge distillation with Vision Transformer (ViT) as the image encoder to capture complex visual features, the model benefits from knowledge distillation, transferring knowledge from an ensemble of Convolutional Neural Networks (CNNs) – including VGG16, InceptionV3, and DenseNet121 – to the ViT, ensuring rich and diverse feature extraction. The GPT-2 used as decoder for generating coherent and contextually relevant narratives. The multi-stage hierarchical attention mechanism further refines this process by progressively focusing on key image regions and aligning them with the generated textual content. On the MIMIC-CXR dataset, our model achieved a BLEU score of 0.127 with precision 0.8832 for the abnormalities, demonstrating notable improvements over previous methods. Further analysis reveals that our approach enhances the generation of detailed and accurate medical reports, as validated by both quantitative metrics and qualitative assessments, reinforcing its effectiveness in capturing critical clinical information.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here
Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom