z-logo
open-access-imgOpen Access
SingleCaffe: An Efficient Framework for Deep Learning on a Single Node
Author(s) -
Chenxu Wang,
Yixian Shen,
Jia Jia,
Yutong Lu,
Zhiguang Chen,
Bo Wang
Publication year - 2018
Publication title -
ieee access
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.587
H-Index - 127
ISSN - 2169-3536
DOI - 10.1109/access.2018.2879877
Subject(s) - aerospace , bioengineering , communication, networking and broadcast technologies , components, circuits, devices and systems , computing and processing , engineered materials, dielectrics and plasmas , engineering profession , fields, waves and electromagnetics , general topics for engineers , geoscience , nuclear engineering , photonics and electrooptics , power, energy and industry applications , robotics and control systems , signal processing and analysis , transportation
Deep learning (DL) is currently the most promising approach in complicated applications such as computer vision and natural language processing. It thrives with large neural networks and large datasets. However, larger models and larger datasets result in longer training times that impede research and development progress. The modern high-performance and data-parallel nature of hardware equipped with high computing power, such as GPUs, has triggered the widespread adoption of such hardware in DL frameworks, such as Caffe, Torch, and TensorFlow. However, most DL frameworks cannot make full use of this high-performance hardware, and computational efficiency is low. In this paper, we present SingleCaffe1, a DL framework that can make full use of such hardware and improve the computational efficiency of the training process. SingleCaffe opens up multiple threads to speed up the training process within a single node and adopts data parallelism on multiple threads. During the training process, SingleCaffe selects a thread as a parameter server thread and the other threads as worker threads. Both data and workloads are distributed across worker threads, while the server thread maintains the globally shared parameters. The framework also manages memory allocation carefully to reduce the memory overhead. The experimental results show that SingleCaffe can improve training efficiency well, and the performance on a single node can even achieve the distributed training of a dozen nodes.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here
Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom