In-Memory Computation for Low Power Machine Learning Applications

Principal Investigator Anantha Chandrakasan

In-Memory Computation for Low Power Machine Learning ApplicationsConvolutional Neural Networks (CNN) have emerged to provide the best results in a wide variety of machine learning (ML) applications, ranging from image classification to speech recognition. However, they require huge amounts of computation and storage. When implemented in the conventional von-Neumann computing architecture, there is a lot of data movement per computation between the memory and the processing elements. This leads to a huge power consumption and long computation time, making CNNs unsuitable for many energy-constrained applications, e.g., smartphones, wearable devices, etc. To address these challenges, we propose embedding computation capability inside the memory. By doing that, we can significantly reduce data transfer to/from the memory and also access multiple memory addresses in parallel, to increase processing speed. The basic convolution operation in a CNN layer can be simplified to a dot-product between the layer inputs (X) and the filter weights (w), to generate the outputs (Y) for that layer.

In this work, CNNs are trained to use binary filter weights (w = +/- 1), which are stored as a digital ‘0’ or ‘1’ in bit-cells of the memory array. The digital inputs (X) are converted to analog voltages and sent to the array, where the dot-products are performed in the analog domain. Finally, the analog dot-product voltages are converted back into the digital domain outputs (Y) for further processing.

To demonstrate functionality for a real CNN architecture, the Modified National Institute of Standards and Technology (MNIST) handwritten digit recognition dataset is used with the LeNet-5 CNN. We demonstrated a classification accuracy of 98.35%, which is within 1% of what can be achieved with an ideal digital implementation. We achieved more than 16x improvement in the energy-efficiency in processing the dot-products vs. full-digital implementations. Thus our approach has the potential to enable low-power ubiquitous ML applications for smart devices in the Internet-of-Everything.