We study comparison sorting in the evolving data model [AKMU11], where the true total order changes while the sorting algorithm is processing the input. More precisely, each comparison operation of the algorithm is followed by a sequence of evolution steps, where an evolution step perturbs the rank of a random item by a "small" random value. The goal is to maintain an ordering that remains close to the true order over time. Previous works have analyzed adaptations of classic sorting algorithms, assuming that an evolution step changes the rank of an item by just one, and that a fixed constant number $b$ of evolution steps take place between two comparisons. In fact, the only previous result achieving optimal $O(n)$ total deviation from the true order, where $n$ is the number of items, applies just for $b=1$ [BDEGJ18]. We analyze a very simple sorting algorithm suggested in [M14], which samples a random pair of adjacent items in each step and swaps them if they are out of order. We show that the algorithm achieves and maintains, w.h.p., optimal total deviation, $O(n)$, and optimal maximum deviation, $O(\log n)$, under very general model settings. Namely, the perturbation introduced by each evolution step follows a distribution of bounded moment generating function, and over a linear number of steps, on average the number of evolution steps between two sorting steps is bounded by an arbitrary constant. Our proof consists of a novel potential function argument that inserts "gaps" in the list of items, and a general framework which separates the analysis of sorting from that of the evolution steps, and is applicable to a variety of settings for which previous approaches do not apply. Our results settle conjectures by [AKMU11] and [M14], and provide theoretical support for the empirical evidence that simple quadratic algorithms are optimal and robust for sorting evolving data [BDEGJ18].
翻译:我们研究演化数据模型[AKMU11]中的比较排序问题,在该模型中,真实全序关系在排序算法处理输入时持续变化。具体而言,算法的每次比较操作后都会跟随一系列演化步骤,其中每个演化步骤以"微小"随机值扰动随机项的顺序排名。目标是维护一个随时间推移保持接近真实顺序的排序。先前研究分析了经典排序算法的适应性调整,假设演化步骤仅将项的顺序排名改变1,且两次比较之间恰好发生固定常数$b$个演化步骤。实际上,之前唯一实现最优$O(n)$总偏差($n$为项数)的结果仅适用于$b=1$的情况[BDEGJ18]。我们分析[M14]中提出的一种极简排序算法,该算法每一步随机采样一对相邻项,若顺序错误则交换。我们证明,在非常通用的模型设置下,该算法能以高概率实现并维持最优总偏差$O(n)$和最优最大偏差$O(\log n)$。具体而言,每个演化步骤引入的扰动遵循有界矩生成函数的分布,且在线性步数内,两次排序步骤间的平均演化步骤数受任意常数限制。我们的证明包含一种新颖的势函数论证方法,通过在项列表中插入"间隙",以及一个通用框架——该框架将排序分析与演化步骤分析相分离,适用于先前方法无法处理的多种设置。我们的结果解决了[AKMU11]与[M14]的猜想,并为简单二次算法在演化数据排序中具有最优性与鲁棒性的实证证据[BDEGJ18]提供了理论支撑。