The project I intend to work on during the Summer 2018 internship at HPCC is to research and develop distributed deep learning algorithms on HPCC Systems. The training process for modern deep neural networks require big data and large computational power. Though HPCC Systems excels at both of these, HPCC is limited to a single node when dealing with neural networks and its training. This project will greatly enhance HPCC System’s neural network capabilities.
This project would aim to begin development of a software library (consisting of ECL and Python code) that would provide HPCC Systems distributed neural network training, using a popular configuration that is well suited for a cluster computer. Called “data parallelism”, this paradigm provides asynchronous training with minimal network overhead and can be used with different neural network training algorithms. This framework would also serve as a building block for future development for different distributed configurations and distributed neural network algorithms.
The deliverables for the scope of this internship: (2 weeks each)
- Functions for interpolating the data between ECL records to TensorFlow runtime
- Functions for interpolating the neural network model between TensorFlow and ECL
- Functions for interpolating NN model parameters between TensorFlow and ECL
- Optimizer: Distributed Batch Gradient Descent
- Statistical performance analysis of the implementation
- Test Cases and Documentation
- Distributed convolutional computation for CNNs
- Optimizer: DOWNPOUR
- Optimizer: Synchronous SGD