Toward Large-Scale Image Segmentation on Summit | Zendy

Sudip K. Seal | Zendy; Seung–Hwan Lim | Zendy; Dali Wang | Zendy; Jacob Hinkle | Zendy; Dalton Lunga | Zendy; Aristeidis Tsaris | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

Toward Large-Scale Image Segmentation on Summit

Author(s) -

Sudip K. Seal,

Seung–Hwan Lim,

Dali Wang,

Jacob Hinkle,

Dalton Lunga,

Aristeidis Tsaris

Publication year - 2020

Publication title -

osti oai (u.s. department of energy office of scientific and technical information)

Language(s) - English

Resource type - Conference proceedings

DOI - 10.1145/3404397.3404468

Subject(s) - computer science , speedup , supercomputer , scalability , deep learning , parallel computing , benchmark (surveying) , artificial intelligence , artificial neural network , task (project management) , segmentation , computer engineering , management , geodesy , database , economics , geography

Semantic segmentation of images is an important computer vision task that emerges in a variety of application domains such as medical imaging, robotic vision and autonomous vehicles to name a few. While these domain-specific image analysis tasks involve relatively small image sizes (∼ 102 × 102), there are many applications that need to train machine learning models on image data with extents that are orders of magnitude larger (∼ 104 × 104). Training deep neural network (DNN) models on large extent images is extremely memory-intensive and often exceeds the memory limitations of a single graphical processing unit, a hardware accelerator of choice for computer vision workloads. Here, an efficient, sample parallel approach to train U-Net models on large extent image data sets is presented. Its advantages and limitations are analyzed and near-linear strong-scaling speedup demonstrated on 256 nodes (1536 GPUs) of the Summit supercomputer. Using a single node of the Summit supercomputer, an early evaluation of a recently released model parallel framework called GPipe is demonstrated to deliver ∼ 2X speedup in executing a U-Net model with an order of magnitude larger number of trainable parameters than reported before. Performance bottlenecks for pipelined training of U-Net models are identified and mitigation strategies to improve the speedups are discussed. Together, these results open up the possibility of combining both approaches into a unified scalable pipelined and data parallel algorithm to efficiently train U-Net models with very large receptive fields on data sets of ultra-large extent images.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research