Entry Date:
January 25, 2017

Structured Nearest Neighbor Search in High Dimensions

Principal Investigator Piotr Indyk

Project Start Date September 2015

Project End Date
 August 2018


A fundamental problem in the analysis of large datasets consists of finding one or more data items that are as similar as possible to an input query. This situation occurs, for example, when a user wants to identify a product captured in a photo. The corresponding computational problem, called Nearest Neighbor (NN) Search, has attracted a large body of research, with several algorithms having significant impact. Yet the state of the art in NN suffers from important theoretical and practical limitations. In particular, it does not provide a natural way to exploit data *structure* that is present in many applications. For example, although the identity of a depicted object does not change when one varies the lighting or the position of the object, the current NN algorithms will treat the resulting images as completely different from each other and thus will mis-identify the object. To overcome this difficulty, in this project the PIs will develop new efficient algorithms that incorporate problem structure into NN search. The PIs expect that such methods will produce substantially better results for many massive data analysis tasks.

To ensure that the work is grounded in an important application, the PIs will focus on computer vision, an area where Internet-scale datasets are having a substantial impact. NN search is vital for computer vision, and in fact many senior computer vision researchers view improved NN techniques as their top algorithmic priority. Image and video have significant structure, often spatial in nature, which algorithmic techniques such as graph cuts have been able to exploit with considerable success. The proposed work will formulate new variants of NN search that make use of additional structure, and will design efficient algorithms to solve these problems over large datasets. In particular, the PIs will investigate three structured NN problem formulations. Simultaneous nearest-neighbor queries involves multiple queries where the answers should be compatible with each other. Nearest-neighbor under transformations considers distances that are invariant to a variety of image transformations. Nearest-neighbors for subspaces involves searching a set of linear or affine subspaces for the one that comes closest to a query point. Broader impacts of the project include graduate training in both algorithms and image processing.