TensorLightning: A Traffic-Efficient Distributed Deep Learning on Commodity Spark Clusters | Zendy

Seil Lee | Zendy; Hanjoo Kim | Zendy; Jaehong Park | Zendy; Jaehee Jang | Zendy; Chang-Sung Jeong | Zendy; Sungroh Yoon | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

TensorLightning: A Traffic-Efficient Distributed Deep Learning on Commodity Spark Clusters

Author(s) -

Seil Lee,

Hanjoo Kim,

Jaehong Park,

Jaehee Jang,

Chang-Sung Jeong,

Sungroh Yoon

Publication year - 2018

Publication title -

ieee access

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 0.587

H-Index - 127

ISSN - 2169-3536

DOI - 10.1109/access.2018.2842103

Subject(s) - aerospace , bioengineering , communication, networking and broadcast technologies , components, circuits, devices and systems , computing and processing , engineered materials, dielectrics and plasmas , engineering profession , fields, waves and electromagnetics , general topics for engineers , geoscience , nuclear engineering , photonics and electrooptics , power, energy and industry applications , robotics and control systems , signal processing and analysis , transportation

With the recent success of deep learning, the amount of data and computation continues to grow daily. Hence a distributed deep learning system that shares the training workload has been researched extensively. Although a scale-out distributed environment using commodity servers is widely used, not only is there a limit due to synchronous operation and communication traffic but also combining deep neural network (DNN) training with existing clusters often demands additional hardware and migration between different cluster frameworks or libraries, which is highly inefficient. Therefore, we propose TensorLightning which integrates the widely used data pipeline of Apache Spark with powerful deep learning libraries, Caffe and TensorFlow. TensorLightning embraces a brand-new parameter aggregation algorithm and parallel asynchronous parameter managing schemes to relieve communication discrepancies and overhead. We redesign the elastic averaging stochastic gradient descent algorithm with pruned and sparse form parameters. Our approach provides the fast and flexible DNN training with high accessibility. We evaluated our proposed framework with convolutional neural network and recurrent neural network models; the framework reduces network traffic by 67% with faster convergence.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research