We present EvoSort, a general-purpose adaptive parallel parallel sorting framework accessible at the Python level. EvoSort employs a Genetic Algorithm (GA) to automatically discover and refine critical parameters, including insertion sort thresholds and algorithm selection (e.g., versus LSD radix sort). By adapting continuously to input data and system architecture, EvoSort provides a drop-in replacement for standard Python routines like NumPy and Pandas. Experiments up to10 billion elements across nine data distributions and two hardware platforms demonstrate that EvoSort consistently outperforms competing methods. Results show speedups of up to 225x, exemplifying a powerful auto-tuning solution for large-scale data processing.
翻译:本文提出EvoSort,一种可在Python层级访问的通用自适应并行排序框架。EvoSort采用遗传算法(GA)自动发现并优化关键参数,包括插入排序阈值与算法选择(例如相较于LSD基数排序)。通过持续适应输入数据与系统架构,EvoSort可直接替代NumPy和Pandas等标准Python例程。在九种数据分布和两种硬件平台上对高达100亿元素的实验表明,EvoSort始终优于现有方法。结果显示其最高可实现225倍加速,为大规模数据处理提供了强大的自动调优解决方案。