We propose novel "clustering and conquer" procedures for the parallel large-scale ranking and selection (R&S) problem, which leverage correlation information for clustering to break the bottleneck of sample efficiency. In parallel computing environments, correlation-based clustering can achieve an $\mathcal{O}(p)$ sample complexity reduction rate, which is the optimal reduction rate theoretically attainable. Our proposed framework is versatile, allowing for seamless integration of various prevalent R&S methods under both fixed-budget and fixed-precision paradigms. It can achieve improvements without the necessity of highly accurate correlation estimation and precise clustering. In large-scale AI applications such as neural architecture search, a screening-free version of our procedure surprisingly surpasses fully-sequential benchmarks in terms of sample efficiency. This suggests that leveraging valuable structural information, such as correlation, is a viable path to bypassing the traditional need for screening via pairwise comparison--a step previously deemed essential for high sample efficiency but problematic for parallelization. Additionally, we propose a parallel few-shot clustering algorithm tailored for large-scale problems.
翻译:我们提出了针对并行大规模排序与选择(R&S)问题的新型"聚类分治"方法,该方法利用相关性信息进行聚类以突破样本效率瓶颈。在并行计算环境下,基于相关性的聚类可实现$\mathcal{O}(p)$的样本复杂度缩减率,这是理论上可达的最优缩减率。我们提出的框架具有通用性,可无缝集成固定预算与固定精度范式下各类主流的R&S方法,且无需高度精确的相关性估计与精准聚类即可实现性能提升。在大规模人工智能应用(如神经架构搜索)中,我们方法的一种无筛选变体在样本效率上意外地超越了全序列基准方法。这表明利用相关性等有价值的结构信息,是绕过传统配对比较筛选步骤的可行途径——该步骤虽曾被视作高样本效率的关键,却难以并行化。此外,我们还提出了适用于大规模问题的并行小样本聚类算法。