Sorting is the task of ordering $n$ elements using pairwise comparisons. It is well known that $m=\Theta(n\log n)$ comparisons are both necessary and sufficient when the outcomes of the comparisons are observed with no noise. In this paper, we study the sorting problem when each comparison is incorrect with some fixed yet unknown probability $p$. Unlike the common approach in the literature which aims to minimize the number of pairwise comparisons $m$ to achieve a given desired error probability, we consider randomized algorithms with expected number of queries $\textsf{E}[M]$ and aim at characterizing the maximal sorting rate $\frac{n\log n}{\textsf{E}[M]}$ such that the ordering of the elements can be estimated with a vanishing error probability asymptotically. The maximal rate is referred to as the noisy sorting capacity. In this work, we derive upper and lower bounds on the noisy sorting capacity. The two lower bounds -- one for fixed-length algorithms and one for variable-length algorithms -- are established by combining the insertion sort algorithm with the well-known Burnashev--Zigangirov algorithm for channel coding with feedback. Compared with existing methods, the proposed algorithms are universal in the sense that they do not require the knowledge of $p$, while maintaining a strictly positive sorting rate. Moreover, we derive a general upper bound on the noisy sorting capacity, along with an upper bound on the maximal rate that can be achieved by sorting algorithms that are based on insertion sort.
翻译:排序是通过成对比较对n个元素进行排序的任务。众所周知,当比较结果无噪声时,m=Θ(n log n)次比较既是必要的也是充分的。本文研究当每次比较以某个固定但未知的概率p出错时的排序问题。与文献中旨在最小化成对比较次数m以实现给定目标错误概率的常见方法不同,我们考虑具有期望查询次数E[M]的随机化算法,并致力于刻画使得元素顺序能以渐近消失的错误概率进行估计的最大排序速率(n log n)/E[M]。该最大速率被称为噪声排序容量。本工作中,我们推导了噪声排序容量的上界和下界。两个下界——分别针对定长算法和变长算法——通过将插入排序算法与著名的Burnashev-Zigangirov反馈信道编码算法相结合而建立。与现有方法相比,所提算法具有普适性,即无需知晓概率p,同时能保持严格正的排序速率。此外,我们推导了噪声排序容量的一般上界,以及基于插入排序的排序算法所能达到的最大速率上界。