Deep neural networks (DNNs) set the foundation for modern AI, and have enabled applications such as object recognition (e.g., automatic photo tagging in Facebook), speech recognition (e.g., Apple Siri), autonomous vehicles, and even strategy planning (e.g., Google DeepMind AlphaGo). While DNNs deliver state-of-the-art accuracy on these applications, they require significant computation resources due to the size of the networks (e.g. hundreds of megabytes for filter weights storage and 30k-600k operations per input pixel). Our goal is to efficiently process these large high-dimensional networks on small, embedded hardware.
Data movement is the dominant source of energy consumption for DNNs due to the high dimensional data. In this project, we developed a framework to generate energy-efficient data flows that minimize data movement. Our architecture consists of a spatial array composed of processing engines (PE), each with local storage; inter PE communication enables regions of PEs to share data. We developed an energy-efficient dataflow, called row stationary, that minimizes data accesses from large expensive memories (DRAM and global buffer), by maximizing data reuse from small low-cost memories (local storage in PE and inter-PE). It exploits all forms of data reuse available in the DCNNs including convolutional, filter and image reuse to deliver 1.4 – 2.5x lower energy consumption than other data flows.
We developed a spatial array hardware accelerator, named Eyeriss, to support our row stationary dataflow. In addition to reducing data movement, we also exploit data statistics in two ways to further reduce energy consumption: (1) we reduce the accelerator energy by 2x using data gating to skip reads and multiplications for zero values; (2) we reduce DRAM bandwidth by up to 2x using run-length compression. The Eyeriss chip was designed in 65nm CMOS, and has been integrated into a system that demonstrates real-time 1000-class image classification at below one-third of a Watt, is over 10x more energy efficient than existing mobile GPUs. In addition, Eyeriss can be reconfigured to support varying filter shapes across different layers within the Deep convolutional neural networks (DCNN) and across different DCNNs, while still delivering high throughput at high energy-efficiency.