ViT-NeBLa: A Hybrid Vision Transformer and Neural Beer–Lambert Framework for Single-View 3D Reconstruction of Oral Anatomy from Panoramic Radiographs
Author(s) -
Bikram Keshari Parida,
Anusree P. Sunilkumar,
Abhijit Sen,
Wonsang You
Publication year - 2025
Publication title -
ieee access
Language(s) - English
Resource type - Magazines
SCImago Journal Rank - 0.587
H-Index - 127
eISSN - 2169-3536
DOI - 10.1109/access.2025.3613789
Subject(s) - aerospace , bioengineering , communication, networking and broadcast technologies , components, circuits, devices and systems , computing and processing , engineered materials, dielectrics and plasmas , engineering profession , fields, waves and electromagnetics , general topics for engineers , geoscience , nuclear engineering , photonics and electrooptics , power, energy and industry applications , robotics and control systems , signal processing and analysis , transportation
Panoramic radiography (PX) is widely used in dentistry but provides only a flattened 2D view; Cone Beam Computed Tomography CT (CBCT) recovers 3D anatomy at higher dose and cost. We tackle full 3D reconstruction from a single real-world PX for varying patients. We propose ViT–NeBLa (Vision-Transformer Neural Beer–Lambert), a physics-guided framework that estimates a continuous 3D dentoalveolar density field directly from single PX. The design mirrors panoramic acquisition while simplifying the inverse problem: we parameterize projection-ray directions by tangency to a patient-adaptive elliptical path and independently restrict sampling to a jaw-focused horseshoe (focal trough) in which the projection rays do not intersect. This removes the intermediate density-aggregation step used by overlapping-ray methods and reduces per- ray samples by about 52%, lowering memory and compute. A hybrid ViT–CNN backbone extracts global anatomical context and local texture from the PX, and a learnable multi-resolution hash positional encoding maps 3D sample coordinates to expressive features that preserve fine dental and osseous detail. Per-point densities are predicted by a compact MLP and accumulated into a coarse 3D grid, which a lightweight 3D U-Net refines into the final volume. Training is end-to-end using synthetic PX rendered from CBCT via the Beer–Lambert law together with voxelwise, projection-consistency, and perceptual losses; at inference, a single PX is processed directly— without CBCT flattening or dental-arch priors. Experiments show that ViT–NeBLa outperforms contemporary PX-to-3D baselines both quantitatively and qualitatively, yielding sharper cortical boundaries, clearer trabecular patterns, and fewer artifacts. In sum, ViT–NeBLa provides a radiation-efficient route to clinically informative 3D visualization from routine panoramic radiographs while simplifying geometry, reducing sampling, and preserving high-frequency structure.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom