Pipelined Training with Stale Weights in Deep Convolutional Neural Networks
Author(s) -
Lifu Zhang,
Tarek S. Abdelrahman
Publication year - 2021
Publication title -
applied computational intelligence and soft computing
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.371
H-Index - 10
eISSN - 1687-9732
pISSN - 1687-9724
DOI - 10.1155/2021/3839543
Subject(s) - computer science , backpropagation , convolutional neural network , inference , parallel computing , flops , residual neural network , artificial neural network , computer engineering , artificial intelligence
The growth in size and complexity of convolutional neural networks (CNNs) is forcing the partitioning of a network across multiple accelerators during training and pipelining of backpropagation computations over these accelerators. Pipelining results in the use of stale weights. Existing approaches to pipelined training avoid or limit the use of stale weights with techniques that either underutilize accelerators or increase training memory footprint. This paper contributes a pipelined backpropagation scheme that uses stale weights to maximize accelerator utilization and keep memory overhead modest. It explores the impact of stale weights on the statistical efficiency and performance using 4 CNNs (LeNet-5, AlexNet, VGG, and ResNet) and shows that when pipelining is introduced in early layers, training with stale weights converges and results in models with comparable inference accuracies to those resulting from nonpipelined training (a drop in accuracy of 0.4%, 4%, 0.83%, and 1.45% for the 4 networks, respectively). However, when pipelining is deeper in the network, inference accuracies drop significantly (up to 12% for VGG and 8.5% for ResNet-20). The paper also contributes a hybrid training scheme that combines pipelined with nonpipelined training to address this drop. The potential for performance improvement of the proposed scheme is demonstrated with a proof-of-concept pipelined backpropagation implementation in PyTorch on 2 GPUs using ResNet-56/110/224/362, achieving speedups of up to 1.8X over a 1-GPU baseline.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom