Entry Date:
May 30, 2012

Tunable Fast Similarity Search for High-Dimensional Data

Principal Investigator Piotr Indyk


Locality-Sensitive Hashing (LSH) is an efficient algorithm for finding pairs of similar (or highly correlated) objects in a database without enumerating all pairs of such objects. Example applications include searching for near-duplicate documents, similar images, highly correlated stocks etc. Although the algorithm is very fast, one can envision further improvements in its efficiency by adapting it to specific data sets. The goal of this project is to develop tools and techniques for performing such tuning.