z-logo
open-access-imgOpen Access
A Dual-Path Self-Supervised Framework for Polyp Segmentation using Vision Transformers and CNNs
Author(s) -
P. Lijin,
Madhu S. Nair
Publication year - 2025
Publication title -
ieee access
Language(s) - English
Resource type - Magazines
SCImago Journal Rank - 0.587
H-Index - 127
eISSN - 2169-3536
DOI - 10.1109/access.2025.3610453
Subject(s) - aerospace , bioengineering , communication, networking and broadcast technologies , components, circuits, devices and systems , computing and processing , engineered materials, dielectrics and plasmas , engineering profession , fields, waves and electromagnetics , general topics for engineers , geoscience , nuclear engineering , photonics and electrooptics , power, energy and industry applications , robotics and control systems , signal processing and analysis , transportation
In the field of medical image analysis, accurate polyp segmentation remains a significant challenge due to the limited availability of labeled training data. This paper introduces a novel approach that addresses this issue by leveraging a self-supervised learning framework combined with deep supervision and a dual path architecture of Vision Transformers (ViTs) and Convolutional Neural Networks (CNNs). Our proposed model consists of two paths: Path 1 employs Vision Transformer blocks, with the initial stage utilizing a pre-trained backbone, while subsequent stages further refine these features. Path 2 incorporates self-supervised learning via the Barlow Twins method to capture global context, followed by ResNet blocks and attention-augmented convolution blocks to extract and enhance local features. The outputs from both paths are fused using a cross-attention module, concatenated, and processed through multiple MLP heads. To enhance the segmentation accuracy, deep supervision is applied at multiple decoder stages, utilizing skip connections from the MobileNetV2 backbone. Additionally, we leverage a modified Deep Convolutional Generative Adversarial Network (DCGAN) to generate synthetic polyp images for Barlow Twins pre-training. Experimental results show that our approach outperforms existing methods in polyp segmentation, achieving higher accuracy and robustness in challenging scenarios. Specifically, our model achieves an average Dice score of 94% and an average Intersection over Union (IoU) of 89%, significantly surpassing state-of-the-art methods.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here
Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom