Let $R \cup B$ be a set of $n$ points in $\mathbb{R}^2$, and let $k \in 1..n$. Our goal is to compute a line that "best" separates the "red" points $R$ from the "blue" points $B$ with at most $k$ outliers. We present an efficient semi-online dynamic data structure that can maintain whether such a separator exists. Furthermore, we present efficient exact and approximation algorithms that compute a linear separator that is guaranteed to misclassify at most $k$, points and minimizes the distance to the farthest outlier. Our exact algorithm runs in $O(nk + n \log n)$ time, and our $(1+\varepsilon)$-approximation algorithm runs in $O(\varepsilon^{-1/2}((n + k^2) \log n))$ time. Based on our $(1+\varepsilon)$-approximation algorithm we then also obtain a semi-online data structure to maintain such a separator efficiently.
翻译:设 $R \cup B$ 为 $\mathbb{R}^2$ 中的 $n$ 个点集,且 $k \in 1..n$。我们的目标是计算一条直线,该直线在最多允许 $k$ 个异常点的情况下“最优”地将“红色”点 $R$ 与“蓝色”点 $B$ 分隔开。我们提出了一种高效的半在线动态数据结构,能够持续维护此类分隔器是否存在。此外,我们提出了高效的精确算法与近似算法,用于计算一个线性分隔器,该分隔器保证最多误分类 $k$ 个点,并最小化到最远异常点的距离。我们的精确算法运行时间为 $O(nk + n \log n)$,而我们的 $(1+\varepsilon)$-近似算法运行时间为 $O(\varepsilon^{-1/2}((n + k^2) \log n))$。基于我们的 $(1+\varepsilon)$-近似算法,我们还进一步获得了一个半在线数据结构,以高效地维护此类分隔器。