This work aims to improve the sample efficiency of parallel large-scale ranking and selection (R&S) problems by leveraging correlation information. We modify the commonly used "divide and conquer" framework in parallel computing by adding a correlation-based clustering step, transforming it into "clustering and conquer". Analytical results under a symmetric benchmark scenario show that this seemingly simple modification yields an $\mathcal{O}(p)$ reduction in sample complexity for a widely used class of sample-optimal R&S procedures. Our approach enjoys two key advantages: 1) it does not require highly accurate correlation estimation or precise clustering, and 2) it allows for seamless integration with various existing R&S procedures, while achieving optimal sample complexity. Theoretically, we develop a novel gradient analysis framework to analyze sample efficiency and guide the design of large-scale R&S procedures. We also introduce a new parallel clustering algorithm tailored for large-scale scenarios. Finally, in large-scale AI applications such as neural architecture search, our methods demonstrate superior performance.
翻译:本研究旨在通过利用相关性信息提升并行大规模排序与选择问题的样本效率。我们通过增加基于相关性的聚类步骤,对并行计算中常用的“分而治之”框架进行改进,将其转化为“聚类与征服”。在对称基准场景下的分析结果表明,这一看似简单的改进为一类广泛使用的样本最优R&S方法带来了$\mathcal{O}(p)$级别的样本复杂度降低。我们的方法具有两个关键优势:1)不需要高精度的相关性估计或精确聚类;2)可与多种现有R&S方法无缝集成,同时达到最优样本复杂度。理论上,我们开发了一种新颖的梯度分析框架来分析样本效率并指导大规模R&S方法的设计。我们还提出了一种专为大规模场景定制的新型并行聚类算法。最后,在神经架构搜索等大规模人工智能应用中,我们的方法展现了优越性能。