Kernel density estimation (KDE) stands out as a challenging task in machine learning. The problem is defined in the following way: given a kernel function $f(x,y)$ and a set of points $\{x_1, x_2, \cdots, x_n \} \subset \mathbb{R}^d$, we would like to compute $\frac{1}{n}\sum_{i=1}^{n} f(x_i,y)$ for any query point $y \in \mathbb{R}^d$. Recently, there has been a growing trend of using data structures for efficient KDE. However, the proposed KDE data structures focus on static settings. The robustness of KDE data structures over dynamic changing data distributions is not addressed. In this work, we focus on the dynamic maintenance of KDE data structures with robustness to adversarial queries. Especially, we provide a theoretical framework of KDE data structures. In our framework, the KDE data structures only require subquadratic spaces. Moreover, our data structure supports the dynamic update of the dataset in sublinear time. Furthermore, we can perform adaptive queries with the potential adversary in sublinear time.
翻译:核密度估计(KDE)是机器学习中的一项具有挑战性的任务。该问题定义如下:给定一个核函数 $f(x,y)$ 和一组点 $\{x_1, x_2, \cdots, x_n \} \subset \mathbb{R}^d$,我们希望针对任意查询点 $y \in \mathbb{R}^d$ 计算 $\frac{1}{n}\sum_{i=1}^{n} f(x_i,y)$。最近,利用数据结构实现高效KDE的趋势日益增长。然而,现有的KDE数据结构主要针对静态设置,并未解决数据结构在动态变化数据分布下的鲁棒性问题。在本工作中,我们聚焦于KDE数据结构的动态维护,并使其对对抗性查询具有鲁棒性。特别地,我们提出了一个KDE数据结构的理论框架。在该框架下,KDE数据结构仅需亚二次空间。此外,我们的数据结构支持在亚线性时间内对数据集进行动态更新。进一步地,我们能够在亚线性时间内应对潜在对手进行自适应查询。