Advancing Visual Recognition with Feature Visualizations

Principal Investigator Antonio Torralba

Project Website http://www.nsf.gov/awardsearch/showAward?AWD_ID=1524817&HistoricalAwards=false

Project Start Date September 2015

Project End Date  August 2018

The goal of this work is to develop a set of tools to visualize the information extracted by computer vision systems so that it is easier for researchers, and users, to understand their behavior. With the success of new computational architectures for visual processing, such as deep neural networks with many processing layers (e.g., convolutional neural networks) and access to large databases with millions of annotated images (e.g., ImageNet, Places), the state of the art in computer vision is advancing rapidly, and becoming integrated into many commercial products. But these advances come with the price that systems are becoming more complex, and it becomes harder for researchers and users to diagnose and understand the representations built by these systems. The goal of this work is to develop new techniques for visualizing what the algorithms are doing in order to elucidate their behavior.

The work will focus on developing algorithms for generic feature inversion. Most features perform complex non-linear operations over the image and it is not always possible to obtain analytic expressions to invert those computations. The goal of the proposal is to introduce new techniques that will allow inverting descriptors without constraining the descriptors. The second challenge will consists in understanding the inversion properties in order to allow comparison among different descriptions. If the inversion contains approximations, comparisons among descriptors might not be possible. Therefore it will be important to understand the convergence properties of the inversion algorithms. Another issue arises from the compressive nature of most descriptors. In general, some part of the input image information will be lost when encoded by an image descriptor. Therefore, the inversion will have to be a one-to-many function. Understanding the space of equivalent images under a particular descriptor will provide insights about what will be the likely errors made by a recognition system using them. This proposal will perform a variety of experiments with the feature visualizations, such as examining invariances in both engineered features and learned features from deep learning, visualizing learned models and decision boundaries, and diagnosing false alarms and missed detections.