Convergence of the QuickVal Residual

QuickSelect (aka Find), introduced by Hoare (1961), is a randomized algorithm for selecting a specified order statistic from an input sequence of $n$ objects, or rather their identifying labels usually known as keys. The keys can be numeric or symbol strings, or indeed any labels drawn from a given linearly ordered set. We discuss various ways in which the cost of comparing two keys can be measured, and we can measure the efficiency of the algorithm by the total cost of such comparisons. We define and discuss a closely related algorithm known as QuickVal and a natural probabilistic model for the input to this algorithm; QuickVal searches (almost surely unsuccessfully) for a specified population quantile $\alpha \in [0, 1]$ in an input sample of size $n$. Call the total cost of comparisons for this algorithm $S_n$. We discuss a natural way to define the random variables $S_1, S_2, \ldots$ on a common probability space. For a general class of cost functions, Fill and Nakama (2013) proved under mild assumptions that the scaled cost $S_n / n$ of QuickVal converges in $L^p$ and almost surely to a limit random variable $S$. For a general cost function, we consider what we term the QuickVal residual: \[ R_n := \frac{S_n}n - S. \] The residual is of natural interest, especially in light of the previous analogous work on the sorting algorithm QuickSort. In the case $\alpha = 0$ of QuickMin with unit cost per key-comparison, we are able to calculate -- \`a la Bindjeme and Fill (2012) for QuickSort -- the exact (and asymptotic) $L^2$-norm of the residual. We take the result as motivation for the scaling factor $\sqrt{n}$ for the QuickVal residual for general population quantiles and for general cost. We then prove in general (under mild conditions on the cost function) that $\sqrt{n}\,R_n$ converges in law to a scale-mixture of centered Gaussians, and we also prove convergence of moments.

翻译：QuickSelect（又称Find）由Hoare（1961）提出，是一种用于从包含$n$个对象的输入序列（或其通常称为键的标识标签）中选取指定顺序统计量的随机化算法。这些键可以是数值、字符串，或任何来自给定线性有序集的标签。我们讨论了衡量两个键比较代价的多种方式，并可通过此类比较的总代价来度量算法的效率。我们定义并讨论了一种密切相关的算法QuickVal，以及该算法输入的自然概率模型；QuickVal在大小为$n$的输入样本中（几乎必然失败地）搜索指定的总体分位数$\alpha \in [0, 1]$。记该算法的比较总代价为$S_n$。我们讨论了一种在公共概率空间上定义随机变量序列$S_1, S_2, \ldots$的自然方法。针对一般代价函数类，Fill与Nakama（2013）在温和假设下证明了QuickVal的标准化代价$S_n / n$在$L^p$意义下且几乎必然收敛于极限随机变量$S$。对于一般代价函数，我们考虑称为QuickVal残差的量：\[ R_n := \frac{S_n}n - S. \] 该残差具有天然的研究价值，尤其是在先前关于排序算法QuickSort的类似工作背景下。在$\alpha = 0$的QuickMin情形（每次键比较代价为单位代价）中，我们能够——仿照Bindjeme与Fill（2012）对QuickSort的处理——精确计算（及渐近计算）残差的$L^2$范数。我们将此结果作为对一般总体分位数和一般代价情形下QuickVal残差缩放因子$\sqrt{n}$的动机依据。随后我们在一般条件下（在代价函数的温和假设下）证明$\sqrt{n}\,R_n$依分布收敛于中心高斯分布的尺度混合，并同时证明了矩收敛性。