A Dual-Path Self-Supervised Framework for Polyp Segmentation using Vision Transformers and CNNs | Zendy

P. Lijin | Zendy; Madhu S. Nair | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

A Dual-Path Self-Supervised Framework for Polyp Segmentation using Vision Transformers and CNNs

Author(s) -

P. Lijin,

Madhu S. Nair

Publication year - 2025

Publication title -

ieee access

Language(s) - English

Resource type - Magazines

SCImago Journal Rank - 0.587

H-Index - 127

eISSN - 2169-3536

DOI - 10.1109/access.2025.3610453

Subject(s) - aerospace , bioengineering , communication, networking and broadcast technologies , components, circuits, devices and systems , computing and processing , engineered materials, dielectrics and plasmas , engineering profession , fields, waves and electromagnetics , general topics for engineers , geoscience , nuclear engineering , photonics and electrooptics , power, energy and industry applications , robotics and control systems , signal processing and analysis , transportation

In the field of medical image analysis, accurate polyp segmentation remains a significant challenge due to the limited availability of labeled training data. This paper introduces a novel approach that addresses this issue by leveraging a self-supervised learning framework combined with deep supervision and a dual path architecture of Vision Transformers (ViTs) and Convolutional Neural Networks (CNNs). Our proposed model consists of two paths: Path 1 employs Vision Transformer blocks, with the initial stage utilizing a pre-trained backbone, while subsequent stages further refine these features. Path 2 incorporates self-supervised learning via the Barlow Twins method to capture global context, followed by ResNet blocks and attention-augmented convolution blocks to extract and enhance local features. The outputs from both paths are fused using a cross-attention module, concatenated, and processed through multiple MLP heads. To enhance the segmentation accuracy, deep supervision is applied at multiple decoder stages, utilizing skip connections from the MobileNetV2 backbone. Additionally, we leverage a modified Deep Convolutional Generative Adversarial Network (DCGAN) to generate synthetic polyp images for Barlow Twins pre-training. Experimental results show that our approach outperforms existing methods in polyp segmentation, achieving higher accuracy and robustness in challenging scenarios. Specifically, our model achieves an average Dice score of 94% and an average Intersection over Union (IoU) of 89%, significantly surpassing state-of-the-art methods.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research