Sorting is a foundational primitive of computer science and optimizations in sorting subroutines can cascade into significant performance gains for high-throughput systems. In this paper, we analyze the inefficiencies of a non-comparison sorting algorithm, namely, Base-n Radix Sort (BNRS), specifically the `zero padding' problem in skewed datasets. We develop an execution model, called, Stable Partitioning - Least Significant Digit Radix Sort (shortly, SP-LSD), an iterative least significant digit based pruning model designed to address this inefficiency. Based on this development, we derive the Radix Crossover Framework(RCF), an analytic three-point decision framework. The framework is established on the precondition of non-negative integers, which enables the derivation of three critical boundaries. First, the Asymptotic Crossover ($k<n^{\log_2 n}$) defines when BNRS and SP-LSD can theoretically outperform the comparison sorting algorithms where k is the maximum value and n is the input size. Second, the Round-feasibility Crossover ($k>n^2$) defines when overhead cost of implemented model SP-LSD is amortized. Third, we derive Pruning Crossover parameterized by the ratio of random-access sorting cost to sequential partitioning cost. This model demonstrates that SP-LSD yields a net gain on skewed and uniform distributions over standard BNRS. The experimental results are consistent with the crossover boundaries, providing a deterministic roadmap for adaptive algorithm selection.
翻译:排序是计算机科学的基础原语,对排序子程序的优化能够为高吞吐量系统带来显著的性能提升。本文分析了一种非比较排序算法——基数排序(BNRS)的效率问题,特别是偏斜数据集中的"零填充"问题。我们开发了一个名为稳定分区-最低有效位基数排序(简称SP-LSD)的执行模型,这是一种基于迭代最低有效位的剪枝模型,旨在解决此效率问题。基于此模型,我们推导出基数交叉框架(RCF),这是一个分析型的三点决策框架。该框架建立在非负整数的前提条件下,能够推导出三个关键边界:首先,渐进交叉边界($k<n^{\log_2 n}$)定义了BNRS和SP-LSD在理论上优于比较排序算法的条件,其中k为最大值,n为输入规模。其次,轮次可行性交叉边界($k>n^2$)定义了所实现模型SP-LSD的开销成本被分摊的条件。第三,我们推导出以随机访问排序成本与顺序分区成本之比为参数的剪枝交叉边界。该模型表明,相较于标准BNRS,SP-LSD在偏斜分布和均匀分布上均能产生净收益。实验结果与交叉边界保持一致,为自适应算法选择提供了确定性路线图。