In this paper, we consider outlier embeddings into HSTs and ultrametrics. In particular, for $(X,d)$, let $k$ be the size of the smallest subset of $X$ such that all but that subset (i.e. the ``outlier set'') can be probabilistically embedded into the space of HSTs with expected distortion at most $c$. Our primary result is showing that there exists an efficient algorithm that takes in $(X,d)$ and a target distortion $c$ and samples from a probabilistic embedding with at most $O(\frac k ε\log^2k)$ outliers and distortion at most $(32+ε)c$, for any $ε>0$. This leads to better instance-specific approximations for certain instances of the buy-at-bulk and dial-a-ride problems, whose current best approximation algorithms go through HST embeddings. In order to facilitate our results, we largely focus on the concept of compositions of nested embeddings introduced by [Chawla and Sheridan 2024]. A nested embedding is a composition of two embeddings of a metric space $(X,d)$ -- a low distortion embedding of a subset $S$ of nodes, and a higher distortion embedding of the entire metric. The composition is a single embedding that preserves the low distortion over $S$ and does not increase distortion over the remaining points by much. In this paper, we expand this concept from the setting of deterministic embeddings to the setting of probabilistic embeddings. We show how to find good nested compositions of embeddings into HSTs, and combine this with an approximation algorithm of [Munagala et al. 2023] to obtain our results.
翻译:本文研究向分层星形树(HST)及超度量空间的离群点嵌入问题。具体而言,对于度量空间$(X,d)$,令$k$表示$X$的最小子集规模,使得除该子集(即"离群集")外的所有点都能以期望失真不超过$c$的概率嵌入到HST空间。我们的核心成果是提出一种高效算法:该算法接收$(X,d)$和目标失真$c$作为输入,可采样生成至多包含$O(\frac k ε\log^2k)$个离群点且失真不超过$(32+ε)c$的概率嵌入(其中$ε>0$)。该成果为按需带宽采购和即时调度问题中特定实例提供了更优的实例相关近似解,而当前针对这些问题的最佳近似算法均依赖HST嵌入技术。为实现上述结果,我们重点拓展了[Chawla and Sheridan 2024]提出的嵌套嵌入组合概念。嵌套嵌入由度量空间$(X,d)$的两个嵌入组合而成——节点子集$S$的低失真嵌入与全空间的高失真嵌入。该组合形成单一嵌入,能在保持$S$上低失真的同时,不过度增加其余点的失真程度。本文将这一概念从确定性嵌入拓展至概率嵌入场景。我们展示了如何构建优质的HST嵌套嵌入组合,并结合[Munagala et al. 2023]的近似算法最终获得研究成果。