Simpler Optimal Sorting from a Directed Acyclic Graph

Fredman proposed in 1976 the following algorithmic problem: Given are a ground set $X$, some partial order $P$ over $X$, and some comparison oracle $O_L$ that specifies a linear order $L$ over $X$ that extends $P$. A query to $O_L$ has as input distinct $x, x' \in X$ and outputs whether $x <_L x'$ or vice versa. If we denote by $e(P)$ the number of linear extensions of $P$, then $\log e(P)$ is a worst-case lower bound on the number of queries needed to output the sorted order of $X$. Fredman did not specify in what form the partial order is given. Haeupler, Hladík, Iacono, Rozhon, Tarjan, and Tětek ('24) propose to assume as input a directed acyclic graph, $G$, with $m$ edges and $n=|X|$ vertices. Denote by $P_G$ the partial order induced by $G$. Algorithmic performance is measured in running time and the number of queries used, where they use $Θ(m + n + \log e(P_G))$ time and $Θ(\log e(P_G))$ queries to output $X$ in its sorted order. Their algorithm is worst-case optimal in terms of running time and queries, both. Their algorithm combines topological sorting with heapsort. Their analysis relies upon sophisticated counting arguments using entropy, recursively defined sets defined over the run of their algorithm, and vertices in the graph that they identify as bottlenecks for sorting. In this paper, we do away with sophistication. We show that when the input is a directed acyclic graph then the problem admits a simple solution using $Θ(m + n + \log e(P_G))$ time and $Θ(\log e(P_G))$ queries. Especially our proofs are much simpler as we avoid the usage of advanced charging arguments and data structures, and instead rely upon two brief observations.

翻译：Fredman 于 1976 年提出了以下算法问题：给定一个基础集合 $X$、$X$ 上的某个偏序关系 $P$，以及一个指定了 $P$ 的线性扩展 $L$ 的比较预言机 $O_L$。对 $O_L$ 的一次查询以不同的 $x, x' \in X$ 作为输入，并输出 $x <_L x'$ 或反之。若记 $e(P)$ 为 $P$ 的线性扩展数量，则 $\log e(P)$ 是输出 $X$ 排序顺序所需查询次数的最坏情况下界。Fredman 未明确说明偏序关系的具体输入形式。Haeupler、Hladík、Iacono、Rozhon、Tarjan 和 Tětek（'24）提出假设输入为一个具有 $m$ 条边和 $n=|X|$ 个顶点的有向无环图 $G$，记 $P_G$ 为由 $G$ 导出的偏序关系。算法性能通过运行时间和查询次数来衡量，他们使用 $Θ(m + n + \log e(P_G))$ 的时间和 $Θ(\log e(P_G))$ 的查询次数来输出 $X$ 的排序顺序。他们的算法在运行时间和查询次数上均达到最坏情况最优。该算法结合了拓扑排序与堆排序，其分析依赖于基于熵的复杂计数论证、算法运行过程中递归定义的集合，以及被识别为排序瓶颈的图中顶点。本文摒弃了这种复杂性。我们证明，当输入为有向无环图时，该问题存在一种简单的解决方案，仅需 $Θ(m + n + \log e(P_G))$ 的时间和 $Θ(\log e(P_G))$ 的查询次数。特别地，我们的证明更为简洁，避免了使用复杂的计费论证和数据结构，而仅依赖于两个简要的观察。

相关内容

排序

关注 313

排序是计算机内经常进行的一种操作，其目的是将一组“无序”的记录序列调整为“有序”的记录序列。分内部排序和外部排序。若整个排序过程不需要访问外存便能完成，则称此类排序问题为内部排序。反之，若参加排序的记录数量很大，整个序列的排序过程不可能在内存中完成，则称此类排序问题为外部排序。内部排序的过程是一个逐步扩大记录的有序序列长度的过程。

李宏毅老师讲解！《AlphaTensor: 用强化学习找出更有效率的矩阵相乘，附Slides与视频

专知会员服务

42+阅读 · 2022年10月15日

【ICML2021】具有持续进化策略的展开计算图的无偏梯度估计

专知会员服务

12+阅读 · 2021年8月10日

《算法凸几何》简明书，Algorithmic Convex Geometry，50页pdf

专知会员服务

42+阅读 · 2021年4月2日

【WWW2021】用优化框架解释和统一图神经网络

专知会员服务

45+阅读 · 2021年2月1日