Vivienne Sze

Associate Professor of Electrical Engineering and Computer Science

Reinventing the neural net chip for local analytics

Reinventing the neural net chip for local analytics

For the future of deep neural networks, Vivienne Sze seeks to disconnect from the cloud by bringing data to the edge on battery-powered devices.

By: Eric Brown

R&D for deep neural networks has largely focused on designing the most powerful deep learning processors and algorithms possible. Now researchers are looking to bring some of that brainpower to edge devices ranging from smartphones to smart cars to internet of things gateways. MIT RLE’s Energy-Efficient Multimedia Systems Group (EEMS), for example, is working on edge analytics projects including an energy-efficient “Eyeriss” deep neural net processor that can analyze video data on battery powered devices. A related “Navion” chip will enable mini-drones to do on-the-fly navigation and analytics for search and rescue.

Systems that enable efficient processing of visual data, so we can disconnect the processing from the cloud and bring it to the edge or the palm of your hand

“We’re developing systems that enable efficient processing of visual data, so we can disconnect the processing from the cloud and bring it to the edge or the palm of your hand,” says Professor Vivienne Sze, who leads the EEMS group. “We want to send only relevant information to the cloud or maybe no information at all.”

The move to edge analytics is primarily driven by the need to reduce latency, which is crucial in self-driving cars, robotics, video-focused IoT gateways, and even consumer smartphone applications. By analyzing video locally, you can keep working during internet outages and slowdowns, while also ensuring privacy.

Privacy concerns are another reason to process locally. “We store huge amounts of visual data on our phones, which we’ll want to search and analyze privately,” says Sze. “Local video analytics could help travelers explore a new city, or enable the visually impaired to navigate. You don’t want to constantly stream whatever you’re looking at into the cloud.”

Sze was one of the key developers of HEVC/H.265 video compression while at the Texas Instruments R&D Center from 2010 to 2013. By adding parallelism features, the developers were able to achieve twice the compression of H.264 while increasing computational complexity by only 50 percent compared to the 4x increase of H.264 over MPEG2.

H.265 came along at the perfect time, as video has grown to represent over 70 percent of all internet traffic. Sze realized, however, that compression wasn’t enough: The next challenge was to understand the content of the video.

“The idea of compression is that you’re sending the video to someone else, but for a lot of applications you don’t need to do that,” she says. “Understanding the content is like the ultimate form of compression. For many applications, all you need to do is extract meaningful information so you can take an action or make a decision. For example, for bandwidth reasons, it may be better not to send out the whole video feed if you’re monitoring traffic or how long people are standing in line. All you need to do is count.”

Eyeriss: Making the neural net energy aware

Sze’s Eyeriss design -- a collaboration with MIT CSAIL’s Joel Emer -- uses a completely new architecture for neural network processing that prioritizes energy efficiency. “The challenge with today’s deep learning algorithms is that they require heavy computation,” says Sze. “One of our goals is to understand the video under the same energy and cost budget you would spend on video compression.”

The initial prototype of Eyeriss is designed as a coprocessor for use on a smartphone. On a phone, as well as in other applications such as self-driving cars, Eyeriss will need to prove that it can more efficiently perform analytics than a co-processor that’s already there: the graphics processing unit (GPU). Chipmakers such as Nvidia have been taking the GPU to new heights to not only rapidly render visual information, but also to analyze it. Sze, however, avoided the GPU as a foundation for Eyeriss. “GPUs have a lot of parallelism and run very quickly, but there’s a lot of overhead for graphics rendering that you don’t need for neural nets,” she says. “GPUs have multiple processing elements, but they don’t communicate with each other the way we can on Eyeriss, which lets us minimize the transactions on and off chip.”

Like H.265, the Eyeriss design was driven by the need to reduce complexity. “We are reducing the complexity to save energy, to achieve higher throughput and frame rates, and to deliver that in a low-cost, battery powered device.”

The challenge is that “neural nets are orders of magnitude more computationally complex than video compression and with orders of magnitude more data,” says Sze. “With video compression, most of the data is the video itself, but with neural nets you’re also dealing with the weights of the neural network and the intermediate feature maps, which can be quite large.”

The best way to reduce complexity and energy use is to reduce data movement. “In video processing, most of the energy goes toward moving data,” says Sze. “It costs more to fetch memory and move it to a multiplier than to do the multiplication itself.”

Compression reduces data movement because there’s less data to move -- but it’s not enough. Eyeriss also uses a “spatial architecture” design that significantly reduces data movement to and from other components and within the chip itself. It’s sort of the micro counterpart to the macro goal of processing locally to avoid moving data to the cloud.

“Eyeriss utilizes small memories very near each processing element so it can read low-cost memories locally and then share the data with other processors,” says Sze. “Our customized hardware understands the relationships of data within the neural network, and how we can manage and minimize movement. The worst type of movement is moving data on and off chip, so once it’s on chip we want to reuse it as much as possible. We want each processing element to reuse the data for as many computations as possible, and if it has to share, we want it to share with a neighboring element rather than sending it off-chip.”

Iterative cross-layer design

Eyeriss’ spatial architecture requires a tight interplay between hardware and algorithms. To accomplish this, Sze adopts a novel “cross-layer” design principle in which algorithms are designed in tandem with the hardware. “Typically in processor design, people develop an algorithm and then throw it over to a hardware person to figure out how to map the algorithms to the hardware,” she says. “It’s very disjointed. With our iterative approach, we optimize both together. First, we change the hardware, and then we change the algorithm to make it more efficient. By going back and forth, we can better analyze the hardware bottlenecks.”

Cross-layer design is also the key to optimizing the way data flows through the system to maximize reuse and minimize energy. “If we can figure out where the energy is going, we can change the algorithm to reduce the parts that consume the most,” says Sze. “If you push one thing down, another thing comes up, so you want to reduce the complexity of the areas that matter. Recently, we were able to develop an energy aware pruning mechanism that utilizes energy to drive the design of deep learning algorithms. In the past, people have reduced the number of operations or counted the weights, but those are indirect proxies for energy. You really need to understand where the energy is going, model it, and then feed it back into the design of the algorithm.”

Navion: Video analytics for tiny drones

Sze has also developed a “Navion” processor in collaboration with MIT AeroAstro’s Sertac Karaman, that is aimed at mini-drone navigation. “Navion is focused on visual inertial odometry to let the drone understand its surroundings, so it’s a little different than Eyeriss,” says Sze. “However, the principle of using cross-layer design to reduce energy and data movement is similar to Eyeriss.”

The target application for Navion is search and rescue. The drones need to be small enough to fly through buildings, and perhaps through the tight spaces of a collapsed building. These size and weight restrictions currently prohibit autonomous operation, so mini-drones are typically remote controlled, and video is streamed wirelessly to avoid storage.

In many search and rescue scenarios, however, the craft is out of communication range or the communication system may be down. With Navion, the drone could navigate autonomously, and restrict storage to stills or short video segments.

“We compress the data that needs to be stored and move it as little as possible,” says Sze. “Every time we touch data, we do as much computation as possible, so ideally we can throw it out and don’t have to store it. We achieve this by breaking the data into pieces and reorder operations.”

Iterative cross-layer design has proven essential to achieving such fine tuning. “We can quickly move from hardware selection to algorithm selection to parameter selection, such as determining how many frames to look across so the drone can decide where it is,” says Sze. “We have to always be thinking about hardware design issues such as what can be parallelized, what can be stored or compressed, and what level of computation precision we need. If I switch the algorithm this way maybe there are other knobs I can exploit from a hardware point of view.”

Sze and Karaman are currently moving their design from an FPGA to an ASIC, which will help them to further customize memory and logic. “We expect another order of magnitude reduction in energy consumption,” says Sze. While Navion is a few years off, Eyeriss will soon be ready for commercialization. “We’re open to all opportunities,” she adds.