Delayed Assignments in Online Non-Centroid Clustering with Stochastic Arrivals

Clustering is a fundamental problem, aiming to partition a set of elements, like agents or data points, into clusters such that elements in the same cluster are closer to each other than to those in other clusters. In this paper, we present a new framework for studying online non-centroid clustering with delays, where elements, that arrive one at a time as points in a finite metric space, should be assigned to clusters, but assignments need not be immediate. Specifically, upon arrival, each point's location is revealed, and an online algorithm has to irrevocably assign it to an existing cluster or create a new one containing, at this moment, only this point. However, we allow decisions to be postponed at a delay cost, instead of following the more common assumption of immediate decisions upon arrival. This poses a critical challenge: the goal is to minimize both the total distance costs between points in each cluster and the overall delay costs incurred by postponing assignments. In the classic worst-case arrival model, where points arrive in an arbitrary order, no algorithm has a competitive ratio better than sublogarithmic in the number of points. To overcome this strong impossibility, we focus on a stochastic arrival model, where points' locations are drawn independently across time from an unknown and fixed probability distribution over the finite metric space. We offer hope for beyond worst-case adversaries: we devise an algorithm that is constant competitive in the sense that, as the number of points grows, the ratio between the expected overall costs of the output clustering and an optimal offline clustering is bounded by a constant.

翻译：聚类是一个基础性问题，旨在将一组元素（如智能体或数据点）划分为若干簇，使得同一簇内的元素彼此之间的距离比与其他簇中元素的距离更近。本文提出了一种新的框架，用于研究具有延迟的在线非中心聚类问题，其中元素作为有限度量空间中的点逐个到达，需要被分配到簇中，但分配不必立即进行。具体而言，每个点到达时其位置被揭示，在线算法必须不可撤销地将其分配到一个现有簇，或创建一个仅包含该点的新簇。然而，我们允许决策被推迟，但需承担延迟成本，而不是遵循到达时立即决策的更常见假设。这带来了一个关键挑战：目标是最小化每个簇内点之间的总距离成本以及因推迟分配而产生的总体延迟成本。在经典的任意顺序到达的最坏情况模型中，任何算法的竞争比都无法优于点数的亚对数级别。为了克服这一强不可能性，我们专注于一种随机到达模型，其中点的位置随时间独立地从有限度量空间上的未知固定概率分布中抽取。我们为超越最坏情况对手提供了希望：我们设计了一种常数竞争算法，其意义在于，随着点数增长，输出聚类的期望总成本与最优离线聚类的期望总成本之比受常数限制。