We propose a general framework for end-to-end learning of data structures. Our framework adapts to the underlying data distribution and provides fine-grained control over query and space complexity. Crucially, the data structure is learned from scratch, and does not require careful initialization or seeding with candidate data structures/algorithms. We first apply this framework to the problem of nearest neighbor search. In several settings, we are able to reverse-engineer the learned data structures and query algorithms. For 1D nearest neighbor search, the model discovers optimal distribution (in)dependent algorithms such as binary search and variants of interpolation search. In higher dimensions, the model learns solutions that resemble k-d trees in some regimes, while in others, they have elements of locality-sensitive hashing. The model can also learn useful representations of high-dimensional data and exploit them to design effective data structures. We also adapt our framework to the problem of estimating frequencies over a data stream, and believe it could also be a powerful discovery tool for new problems.
翻译:我们提出一个用于数据结构的端到端学习的通用框架。我们的框架能够适应底层数据分布,并对查询和空间复杂度提供精细控制。关键在于,数据结构是从零开始学习的,无需仔细初始化或用候选数据结构/算法进行种子设定。我们首先将此框架应用于最近邻搜索问题。在多种设置下,我们能够逆向工程出学习到的数据结构及查询算法。对于一维最近邻搜索,模型发现了最优的(不)依赖分布的算法,例如二分查找和插值搜索的变体。在更高维度中,模型在某些情况下学习到类似k-d树的解决方案,而在其他情况下则具有局部敏感哈希的元素。该模型还能学习高维数据的有用表示,并利用它们设计有效的数据结构。我们还将框架应用于数据流频率估计问题,并相信它也能成为新问题的强大发现工具。