Visual object detection and recognition are needed for a wide range of applications including robotics/drones, self-driving cars, smart Internet of Things, and portable/wearable electronics. For many of these applications, local embedded processing is preferred due to privacy or latency concerns. In this talk, we will describe how joint algorithm and hardware design can be used to reduce the energy consumption of object detection and recognition while delivering real-time and robust performance. We will discuss several energy-efficient techniques that exploit sparsity, reduce data movement and storage costs, and show how they can be applied to popular forms of object detection and recognition, including those that use deep convolutional neural nets (CNNs). We will present results from recently fabricated ASICs (including our deep CNN accelerator named “Eyeriss” which is 10x more energy efficient than a mobile GPU) that demonstrate these techniques in real-time computer vision systems.