We propose novel "clustering and conquer" procedures for the parallel large-scale ranking and selection (R&S) problem, which leverage correlation information for clustering to break the bottleneck of sample efficiency. In parallel computing environments, correlation-based clustering can achieve an $\mathcal{O}(p)$ sample complexity reduction rate, which is the optimal reduction rate theoretically attainable. Our proposed framework is versatile, allowing for seamless integration of various prevalent R&S methods under both fixed-budget and fixed-precision paradigms. It can achieve improvements without the necessity of highly accurate correlation estimation and precise clustering. In large-scale AI applications such as neural architecture search, a screening-free version of our procedure surprisingly surpasses fully-sequential benchmarks in terms of sample efficiency. This suggests that leveraging valuable structural information, such as correlation, is a viable path to bypassing the traditional need for screening via pairwise comparison--a step previously deemed essential for high sample efficiency but problematic for parallelization. Additionally, we propose a parallel few-shot clustering algorithm tailored for large-scale problems.
翻译:我们针对并行大规模排序与选择问题提出了新颖的“聚类与征服”程序,通过利用相关性信息进行聚类来突破样本效率瓶颈。在并行计算环境中,基于相关性的聚类可实现$\mathcal{O}(p)$的样本复杂度降低率,这是理论可达到的最优降低率。该框架具有高度通用性,可无缝集成固定预算和固定精度范式下多种主流排序与选择方法,且无需高度精确的相关性估计与精准聚类即可提升性能。在大规模人工智能应用(如神经架构搜索)中,本程序的无筛选版本在样本效率上出人意料地超越了完全顺序基准方法。这表明利用相关性等有价值的结构信息,可绕过传统依赖成对比较进行筛选的路径——该步骤虽被视为实现高样本效率的关键,但其并行化存在固有困难。此外,我们提出了一种专为大规模问题设计的并行少样本聚类算法。