This paper studies the computational difficulty of clustering problems that are defined directly on a continuous probability density. Rather than working with finite samples, we assume the density is given as a polynomial and ask whether it contains certain cluster structures. Four natural questions are examined. First, do there exist several points with high density that are far apart from each other. Second, do two high density points have a midpoint with low density, creating a valley between them. Third, does the region where the density is above a threshold have at least a given number of separate connected pieces. Fourth, does that same region contain a hole, meaning a loop that cannot be shrunk to a point. We prove that the first two problems, separated points and valley detection, are exactly as hard as the existential theory of the reals, a complexity class that contains NP and is believed to be strictly larger. In contrast, the topological problems of counting connected pieces and detecting holes are at least as hard as the existential theory of the reals, but their exact complexity remains open. Placing them inside that class would need a major advance in real algebraic geometry. These results give the first rigorous classification of exact continuous clustering inside the real polynomial hierarchy. They also show that even basic clustering criteria are not NP complete unless unexpected collapses occur.
翻译:本文研究了直接定义在连续概率密度上的聚类问题的计算难度。我们不处理有限样本,而是假设密度以多项式形式给出,并询问它是否包含某些聚类结构。我们考察了四个自然问题。第一,是否存在几个彼此相距较远的高密度点。第二,两个高密度点之间的中点是否具有低密度,从而在它们之间形成一个“山谷”。第三,密度高于阈值的区域是否至少包含给定数量的独立连通分支。第四,同一区域是否包含一个空洞,即一个无法收缩为一点的环。我们证明前两个问题(分离点检测和山谷检测)的难度恰好等同于实数存在性理论——一个包含NP且被认为严格更大的复杂度类。相比之下,计数连通分支和检测空洞的拓扑问题难度至少不低于实数存在性理论,但其精确复杂度仍未确定。将它们归入该类需要实代数几何的重大突破。这些结果首次给出了实数多项式层次结构中精确连续聚类的严格分类。它们还表明,即使是最基本的聚类标准,除非发生意外的坍塌,否则不会是NP完全的。