In the online sorting problem, we have an array $A$ of $n$ cells, and receive a stream of $n$ items $x_1,\dots,x_n\in [0,1]$. When an item arrives, we need to immediately and irrevocably place it into an empty cell. The goal is to minimize the sum of absolute differences between adjacent items, which is called the \emph{cost} of the algorithm. It has been shown by Aamand, Abrahamsen, Beretta, and Kleist (SODA 2023) that when the stream $x_1,\dots,x_n$ is generated adversarially, the optimal cost bound for any deterministic algorithm is $\Theta(\sqrt{n})$. In this paper, we study the stochastic version of online sorting, where the input items $x_1,\dots,x_n$ are sampled uniformly at random. Despite the intuition that the stochastic version should yield much better cost bounds, the previous best algorithm for stochastic online sorting by Abrahamsen, Bercea, Beretta, Klausen and Kozma (ESA 2024) only achieves $\tilde{O}(n^{1/4})$ cost, which seems far from optimal. We show that stochastic online sorting indeed allows for much more efficient algorithms, by presenting an algorithm that achieves expected cost $\log n\cdot 2^{O(\log^* n)}$. We also prove a cost lower bound of $\Omega(\log n)$, thus show that our algorithm is nearly optimal.
翻译:在线排序问题中,我们有一个包含 $n$ 个单元的数组 $A$,并按序接收 $n$ 个来自 $[0,1]$ 区间的数据项 $x_1,\dots,x_n$。每个数据项到达时,必须立即且不可撤销地将其放入一个空单元。算法的目标是使相邻数据项之间绝对差值的总和最小,该总和称为算法的\emph{成本}。Aamand、Abrahamsen、Beretta 和 Kleist(SODA 2023)已证明,当数据流 $x_1,\dots,x_n$ 由对抗性方式生成时,任何确定性算法的最优成本边界为 $\Theta(\sqrt{n})$。本文研究随机版本的在线排序问题,其中输入数据项 $x_1,\dots,x_n$ 服从均匀随机采样。尽管直觉上随机版本应能获得更好的成本边界,但 Abrahamsen、Bercea、Beretta、Klausen 和 Kozma(ESA 2024)提出的先前最佳随机在线排序算法仅能达到 $\tilde{O}(n^{1/4})$ 的成本,这似乎远未达到最优。我们通过提出一种实现期望成本 $\log n\cdot 2^{O(\log^* n)}$ 的算法,证明随机在线排序确实允许更高效的算法。同时,我们证明了 $\Omega(\log n)$ 的成本下界,从而表明我们的算法近乎最优。