Delayed Assignments in Online Non-Centroid Clustering with Stochastic Arrivals

Clustering is a fundamental problem, aiming to partition a set of elements, like agents or data points, into clusters such that elements in the same cluster are closer to each other than to those in other clusters. In this paper, we present a new framework for studying online non-centroid clustering with delays, where elements, that arrive one at a time as points in a finite metric space, should be assigned to clusters, but assignments need not be immediate. Specifically, upon arrival, each point's location is revealed, and an online algorithm has to irrevocably assign it to an existing cluster or create a new one containing, at this moment, only this point. However, we allow decisions to be postponed at a delay cost, instead of following the more common assumption of immediate decisions upon arrival. This poses a critical challenge: the goal is to minimize both the total distance costs between points in each cluster and the overall delay costs incurred by postponing assignments. In the classic worst-case arrival model, where points arrive in an arbitrary order, no algorithm has a competitive ratio better than sublogarithmic in the number of points. To overcome this strong impossibility, we focus on a stochastic arrival model, where points' locations are drawn independently across time from an unknown and fixed probability distribution over the finite metric space. We offer hope for beyond worst-case adversaries: we devise an algorithm that is constant competitive in the sense that, as the number of points grows, the ratio between the expected overall costs of the output clustering and an optimal offline clustering is bounded by a constant.

翻译：聚类是一个基础性问题，旨在将一组元素（如智能体或数据点）划分到不同簇中，使得同一簇内的元素彼此间的距离小于与其他簇元素间的距离。本文提出了一种研究带延迟的在线非质心聚类的新框架——元素作为有限度量空间中的点逐个到达，需被分配到簇中，但分配不必立即完成。具体而言，每个点到达时其位置将被揭示，在线算法需不可撤销地将其分配到现有簇中，或创建一个仅包含该点的新簇。然而，我们允许以延迟成本为代价推迟决策，而非采用到达时立即决策这一更常见的假设。这带来了关键挑战：目标是最小化每个簇内点之间的总距离成本以及因推迟分配产生的总延迟成本。在经典的 Worst-case 到达模型中（点以任意顺序到达），没有任何算法的竞争比优于点数量的次对数水平。为突破这一强不可能性，我们聚焦于随机到达模型——点的位置随时间独立地取自有限度量空间上未知且固定的概率分布。我们为超越 Worst-case 对抗者提供了希望：设计了一个常数竞争比的算法，即随着点数量增长，输出聚类的期望总成本与最优离线聚类期望总成本之比被一个常数界定。