Let $R \cup B$ be a set of $n$ points in $\mathbb{R}^2$, and let $k \in 1..n$. Our goal is to compute a line that "best" separates the "red" points $R$ from the "blue" points $B$ with at most $k$ outliers. We present an efficient semi-online dynamic data structure that can maintain whether such a separator exists. Furthermore, we present efficient exact and approximation algorithms that compute a linear separator that is guaranteed to misclassify at most $k$, points and minimizes the distance to the farthest outlier. Our exact algorithm runs in $O(nk + n \log n)$ time, and our $(1+\varepsilon)$-approximation algorithm runs in $O(\varepsilon^{-1/2}((n + k^2) \log n))$ time. Based on our $(1+\varepsilon)$-approximation algorithm we then also obtain a semi-online data structure to maintain such a separator efficiently.
翻译:设$R \cup B$为$\mathbb{R}^2$中$n$个点的集合,且$k \in 1..n$。我们的目标是计算一条直线,以最多$k$个异常点为代价“最优”地分隔“红色”点集$R$与“蓝色”点集$B$。我们提出了一种高效的半在线动态数据结构,能够持续判定此类分隔器是否存在。此外,我们提出了高效的精确算法与近似算法,用于计算保证最多误分类$k$个点且最小化到最远异常点距离的线性分隔器。我们的精确算法时间复杂度为$O(nk + n \log n)$,而$(1+\varepsilon)$-近似算法的时间复杂度为$O(\varepsilon^{-1/2}((n + k^2) \log n))$。基于该$(1+\varepsilon)$-近似算法,我们进一步构建了一个半在线数据结构,以高效维护此类分隔器。