Entry Date:
August 6, 2018

Trained Tenary Quantization (TTQ)

Principal Investigator Song Han


The deployment of large neural networks models can be difficult for mobile devices with limited power budgets. To solve this problem, we propose Trained Ternary Quantization (TTQ), a method that can reduce the precision of weights in neural networks to ternary values. This method has very little accuracy degradation and can even improve the accuracy of some models. We highlight our trained quantization method that can learn both ternary values and ternary assignment. During inference, the models are nearly 16× smaller than full-precision models.