We study extensions of the classic \emph{Line Cover} problem, which asks whether a set of $n$ points in the plane can be covered using $k$ lines. Line Cover is known to be NP-hard, and we focus on two natural generalizations. The first is \textbf{Line Clustering}, where the goal is to find $k$ lines minimizing the sum of squared distances from the input points to their nearest line. The second is \textbf{Hyperplane Cover}, which asks whether $n$ points in $\mathbb{R}^d$ can be covered by $k$ hyperplanes. We also study the more general \textbf{Projective Clustering} problem, which unifies both settings and has applications in machine learning, data analysis, and computational geometry. In this problem, one seeks $k$ affine subspaces of dimension $r$ that minimize the sum of squared distances from the given points in $\mathbb{R}^d$ to the nearest subspace. Our results reveal notable differences in the parameterized complexity of these problems. While Line Cover is fixed-parameter tractable when parameterized by $k$, we show that Line Clustering is W[1]-hard with respect to $k$ and does not admit an algorithm with running time $n^{o(k)}$ unless the Exponential Time Hypothesis fails. Hyperplane Cover has been known to be NP-hard since the 1980s, following work of Megiddo and Tamir, even for $d=2$, we show that it remains NP-hard even when $k=2$. Finally, we present an algorithm for Projective Clustering running in $n^{O(dk(r+1))}$ time. This bound matches our lower bound for Line Clustering and generalizes the classic algorithm for $k$-Means Clustering ($r=0$) by Inaba, Katoh, and Imai [SoCG 1994].
翻译:我们研究经典《直线覆盖》问题的扩展,该问题询问平面上的 $n$ 个点是否能被 $k$ 条直线覆盖。已知直线覆盖问题是NP难的,我们聚焦于其两种自然推广。第一种是《直线聚类》,目标是找到 $k$ 条直线,最小化输入点到其最近直线的平方距离之和。第二种是《超平面覆盖》,询问 $\mathbb{R}^d$ 中的 $n$ 个点是否能被 $k$ 个超平面覆盖。我们还研究了更一般的《投影聚类》问题,该问题统一了上述两种设定,并在机器学习、数据分析和计算几何中有应用。在此问题中,我们需要寻找 $k$ 个维度为 $r$ 的仿射子空间,以最小化 $\mathbb{R}^d$ 中给定点到最近子空间的平方距离之和。我们的结果揭示了这些问题在参数化复杂度上的显著差异。虽然当以 $k$ 为参数时直线覆盖是固定参数可解的,但我们证明直线聚类关于 $k$ 是 W[1] 难的,并且除非指数时间假设不成立,否则不存在时间复杂度为 $n^{o(k)}$ 的算法。超平面覆盖自20世纪80年代以来(继Megiddo和Tamir的工作之后)已知是NP难的,即使对于 $d=2$ 也是如此;我们证明即使当 $k=2$ 时它仍然是NP难的。最后,我们提出一种时间复杂度为 $n^{O(dk(r+1))}$ 的投影聚类算法。该界匹配了我们对直线聚类的下界,并推广了Inaba、Katoh和Imai [SoCG 1994] 关于 $k$ 均值聚类($r=0$)的经典算法。