Deep Compression

Principal Investigator Song Han

Large deep neural network model improves prediction accuracy but results in large demand for memory access, which is 100× more power hungry than ALU operations. “Deep Compression” introduces a three stage pipeline: pruning, trained quantization and Huffman coding, that work together to reduce the storage requirement of deep neural networks. Experimented on Imagenet dataset: AlexNet got compressed by 35×, from 240MB to 6.9MB; VGGNet got compressed by 49×, from 552MB to 11.3MB, without affecting their accuracy. This algorithm helps putting deep learning into mobile App.