We would like to reduce the energy consumption substantially such that all cameras can be made “smart” and output meaningful information with little need for human intervention. As an intermediate benchmark, we would like to make understanding pixels as energy-efficient as compressing pixels, so that computer vision can be as ubiquitous as video compression, which is present in most cameras today. This is challenging, as computer vision often requires that the data be transformed into a much higher dimensional space than video compression, which results in more computation and data movement. For example, object detection, used in applications such as Advanced Driver Assistant Systems (ADAS), autonomous control in Unmanned Aerial Vehicles (UAV), mobile robot vision, video surveillance, and portable devices, requires an additional dimension of image scaling to detect objects of different sizes, which increases the number of pixels that must be processed.
In this project, we use joint algorithm and hardware design to reduce computational complexity by exploiting data statistics. We developed object detection algorithms that enforce sparsity in both the feature extraction from the image as well as the weights in the classifier. In addition, when detecting deformable objects using multiple classifiers to identify both the root and the different parts of an object, we only perform the parts classification on the high scoring roots. We then design hardware that exploits these various forms of sparsity for energy-efficiency, which reduces the energy consumption by 5x, enabling object detection to be as energy-efficient as video compression at < 1nJ/pixel. This is an important step towards achieving continuous mobile vision, which benefits applications such as wearable vision for the blind.