Tight Bounds for Sorting Under Partial Information

Sorting has a natural generalization where the input consists of: (1) a ground set $X$ of size $n$, (2) a partial oracle $O_P$ specifying some fixed partial order $P$ on $X$ and (3) a linear oracle $O_L$ specifying a linear order $L$ that extends $P$. The goal is to recover the linear order $L$ on $X$ using the fewest number of linear oracle queries. In this problem, we measure algorithmic complexity through three metrics: oracle queries to $O_L$, oracle queries to $O_P$, and the time spent. Any algorithm requires worst-case $\log_2 e(P)$ linear oracle queries to recover the linear order on $X$. Kahn and Saks presented the first algorithm that uses $\Theta(\log e(P))$ linear oracle queries (using $O(n^2)$ partial oracle queries and exponential time). The state-of-the-art for the general problem is by Cardinal, Fiorini, Joret, Jungers and Munro who at STOC'10 manage to separate the linear and partial oracle queries into a preprocessing and query phase. They can preprocess $P$ using $O(n^2)$ partial oracle queries and $O(n^{2.5})$ time. Then, given $O_L$, they uncover the linear order on $X$ in $\Theta(\log e(P))$ linear oracle queries and $O(n + \log e(P))$ time -- which is worst-case optimal in the number of linear oracle queries but not in the time spent. For $c \geq 1$, our algorithm can preprocess $O_P$ using $O(n^{1 + \frac{1}{c}})$ queries and time. Given $O_L$, we uncover $L$ using $\Theta(c \log e(P))$ queries and time. We show a matching lower bound, as there exist positive constants $(\alpha, \beta)$ where for any constant $c \geq 1$, any algorithm that uses at most $\alpha \cdot n^{1 + \frac{1}{c}}$ preprocessing must use worst-case at least $\beta \cdot c \log e(P)$ linear oracle queries. Thus, we solve the problem of sorting under partial information through an algorithm that is asymptotically tight across all three metrics.

翻译：排序问题存在一个自然的推广形式，其输入包含：(1) 一个大小为 n 的基础集合 X，(2) 一个指定 X 上某个固定偏序 P 的部分信息预言机 O_P，以及 (3) 一个指定扩展 P 的线性序 L 的线性预言机 O_L。目标是通过最少的线性预言机查询次数来恢复 X 上的线性序 L。在此问题中，我们通过三个指标衡量算法复杂度：对 O_L 的预言机查询次数、对 O_P 的预言机查询次数以及算法运行时间。任何算法在最坏情况下都需要至少 log₂ e(P) 次线性预言机查询才能恢复 X 上的线性序。Kahn 和 Saks 提出了首个使用 Θ(log e(P)) 次线性预言机查询的算法（同时使用 O(n²) 次部分预言机查询和指数时间）。该通用问题的最先进成果由 Cardinal、Fiorini、Joret、Jungers 和 Munro 在 STOC'10 上提出，他们成功将线性和部分预言机查询分离为预处理和查询两个阶段。他们可以使用 O(n²) 次部分预言机查询和 O(n^{2.5}) 时间预处理 P。随后，在给定 O_L 的情况下，他们通过 Θ(log e(P)) 次线性预言机查询和 O(n + log e(P)) 时间揭示 X 上的线性序——该结果在线性预言机查询次数上达到最坏情况最优，但在时间消耗上并非最优。对于 c ≥ 1，我们的算法可以使用 O(n^{1 + \frac{1}{c}}) 次查询和时间预处理 O_P。在给定 O_L 后，我们通过 Θ(c log e(P)) 次查询和时间揭示 L。我们证明了一个匹配的下界：存在正常数 (α, β)，使得对于任意常数 c ≥ 1，任何使用至多 α·n^{1 + \frac{1}{c}} 次预处理的算法，在最坏情况下必须使用至少 β·c log e(P) 次线性预言机查询。因此，我们通过一个在所有三个指标上均渐近紧的算法，解决了部分信息下的排序问题。