Self-Supervised Depth Estimation and 3D Reconstruction with Layer-Wise LoRA of Foundation Model in Endoscopy
Author(s) -
Saad Khalil,
Sol Kim,
Bo-In Lee,
Youngbae Hwang
Publication year - 2025
Publication title -
ieee access
Language(s) - English
Resource type - Magazines
SCImago Journal Rank - 0.587
H-Index - 127
eISSN - 2169-3536
DOI - 10.1109/access.2025.3617567
Subject(s) - aerospace , bioengineering , communication, networking and broadcast technologies , components, circuits, devices and systems , computing and processing , engineered materials, dielectrics and plasmas , engineering profession , fields, waves and electromagnetics , general topics for engineers , geoscience , nuclear engineering , photonics and electrooptics , power, energy and industry applications , robotics and control systems , signal processing and analysis , transportation
Depth estimation is crucial for 3D reconstruction and surgical navigation, providing critical insights for endoscopic procedures. While foundation models excel in depth estimation for natural images, their performance in the medical domain remains limited, particularly under challenging conditions like brightness fluctuations. This study develops a robust self-supervised framework for monocular depth estimation to address these challenges. We introduce a layer-wise low-rank adaptation (LW-LoRA) of the Depth-Anything-V2 foundation model, tailored for endoscopic data. Unlike conventional fine-tuning, LW-LoRA adjusts the LoRA rank across encoder layers for efficient training. The method integrates residual convolutional blocks (ResConv) to capture fine-grained details and a multi-head attention-based pose network to enhance camera pose estimation, ensuring accurate 3D reconstructions. A multi-scale SSIM-based reprojection loss refines depth predictions, while a brightness calibration module ensures robustness against illumination inconsistencies. During training, the backbone encoder is frozen, optimizing only the LoRA layers for efficiency. Extensive evaluations on the SCARED dataset highlight the superior performance of our framework, offering faster inference and high-quality depth maps. Zero-shot testing on Hamlyn and clinical datasets confirms its generalization across diverse data types. Our framework efficiently adapts the foundation model for depth estimation in the medical domain, addressing challenges in endoscopic imaging, such as brightness variations and fine-detail preservation. It enables accurate, dense 3D point cloud reconstructions, ensuring reliable performance in clinical settings.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom