Efficient Scheduling in Training Deep Convolutional Networks at Large Scale
Author(s) -
Can Que,
Xinming Zhang
Publication year - 2018
Publication title -
ieee access
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.587
H-Index - 127
ISSN - 2169-3536
DOI - 10.1109/access.2018.2875407
Subject(s) - aerospace , bioengineering , communication, networking and broadcast technologies , components, circuits, devices and systems , computing and processing , engineered materials, dielectrics and plasmas , engineering profession , fields, waves and electromagnetics , general topics for engineers , geoscience , nuclear engineering , photonics and electrooptics , power, energy and industry applications , robotics and control systems , signal processing and analysis , transportation
The deep convolutional network is one of the most successful machine learning models in recent years. However, training large deep networks is a time consuming process. Due to a large number of parameters in these networks, the efficiency of data parallel methods is usually limited by the communication speed of networks. In this paper, we introduce two new algorithms to speedup training large deep networks with multiple machines: (1) propose a new scheduling algorithm to reduce communication delay in gradient transmission and (2) present a new collective algorithm based on reverse-reduce tree to reduce link contentions. We implement our algorithms on a well-known library Caffe and obtain near linearly scaling performance on commodity Ethernet networks.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom