Workshop in Biostatistics

M112 Alway Building, Medical Center

(next to the Dean's courtyard)

DATE: March 2, 2017
TIME: 1:30 - 2:50 pm
TITLE: High dimensional Nearest Neighbor Search via Locality Sensitive Hashing

Moses Charikar
Professor of Computer Science, Stanford



Locality-Sensitive Hashing (LSH) is a popular class of methods for the nearest neighbor search problem, defined thus: given a set of points in a metric space, the goal is to preprocess the data set so that nearest neighbor queries can be answered quickly: given a previously unseen query point, we want to find one or several points in our dataset that are closest to the query point. 

The key building block for LSH is a function that maps points to buckets in a hash table; close points are more likely to be mapped to the same bucket. In this talk, I'll survey the use of LSH for nearest neighbor search, describe several constructions of locality sensitive hash functions, and outline the evolution of these ideas over the past couple of decades. Time permitting, I will describe some ongoing work on fast kernel density estimation via LSH.

Suggested readings:

Locality-Sensitive Hashing (LSH): a Primer

Jingdong Wang, Heng Tao Shen, Jingkuan Song, and Jianqiu Ji. Hashing for Similarity Search: A Survey.