User: fabian-donat-natterer's Post: Exploiting Sparsity in Pruned Neural Networks to Optimize Large Model Training. (arXiv:2302.05045v3 [cs.LG] UPDATED)

Exploiting Sparsity in Pruned Neural Networks to Optimize Large Model Training. (arXiv:2302.05045v3 [cs.LG] UPDATED)

Click here to flash read.

Parallel training of neural networks at scale is challenging due to
significant overheads arising from communication. Recently, deep learning
researchers have developed a variety of pruning algorithms that are capable of
pruning (i.e. setting to zero) 80-90% of the parameters in a neural network to
yield sparse subnetworks that equal the accuracy of the unpruned parent
network. In this work, we propose a novel approach that exploits these sparse
subnetworks to optimize the memory utilization and communication in two popular
algorithms for parallel deep learning namely -- data and inter-layer
parallelism. We integrate our approach into AxoNN, a highly scalable framework
for parallel deep learning that relies on data and inter-layer parallelism, and
demonstrate the reduction in communication time and memory utilization. On 512
NVIDIA V100 GPUs, our optimizations reduce the memory consumption of a 2.7
billion parameter model by 74%, and the total communication time by 40%, thus
providing an overall speedup of 34% over AxoNN, 32% over DeepSpeed-3D and 46%
over Sputnik, a sparse matrix computation baseline.

Click here to read this post out

ID: 129912; Unique Viewers: 0

Voters: 0

Latest Change: May 16, 2023, 7:32 a.m. Changes:

Dictionaries:

Words:

Spaces:

Comments:

Newcom