Optimal Data-Dependent Hashing for Approximate Near Neighbors
Alexandr Andoni, Ilya Razenshteyn
We show an optimal data-dependent hashing scheme for the approximate near neighbor problem. For an -point data set in a -dimensional space our data structure achieves query time and space , where for the Euclidean space and approximation c>1. For the Hamming space, we obtain an exponent of . Our result completes the direction set forth in [AINR14] who gave a proof-of-concept that data-dependent hashing can outperform classical Locality Sensitive Hashing (LSH). In contrast to [AINR14], the new bound is not only optimal, but in fact improves over the best (optimal) LSH data structures [IM98,AI06] for all approximation factors c>1. From the technical perspective, we proceed by decomposing an arbitrary dataset into several subsets that are, in a certain sense, pseudo-random.