ALMAB-DC: Active Learning, Multi-Armed Bandits, and Distributed Computing for Sequential Experimental Design and Black-Box Optimization

Sequential experimental design under expensive, gradient-free objectives is a central challenge in computational statistics: evaluation budgets are tightly constrained and information must be extracted efficiently from each observation. We propose \textbf{ALMAB-DC}, a GP-based sequential design framework combining active learning, multi-armed bandits (MAB), and distributed asynchronous computing for expensive black-box experimentation. A Gaussian process surrogate with uncertainty-aware acquisition identifies informative query points; a UCB or Thompson-sampling bandit controller allocates evaluations across parallel workers; and an asynchronous scheduler handles heterogeneous runtimes. We present cumulative regret bounds for the bandit components and characterize parallel scalability via Amdahl's Law. We validate ALMAB-DC on five benchmarks. On the two statistical experimental-design tasks, ALMAB-DC achieves lower simple regret than Equal Spacing, Random, and D-optimal designs in dose--response optimization, and in adaptive spatial field estimation matches the Greedy Max-Variance benchmark while outperforming Latin Hypercube Sampling; at $K=4$ the distributed setting reaches target performance in one-quarter of sequential wall-clock rounds. On three ML/engineering tasks (CIFAR-10 HPO, CFD drag minimization, MuJoCo RL), ALMAB-DC achieves 93.4\% CIFAR-10 accuracy (outperforming BOHB by 1.7\,pp and Optuna by 1.1\,pp), reduces airfoil drag to $C_D = 0.059$ (36.9\% below Grid Search), and improves RL return by 50\% over Grid Search. All advantages over non-ALMAB baselines are statistically significant under Bonferroni-corrected Mann--Whitney $U$ tests. Distributed execution achieves $7.5\times$ speedup at $K = 16$ agents, consistent with Amdahl's Law.

翻译：在成本高昂、无梯度目标的条件下进行序贯实验设计是计算统计学中的核心挑战：评估预算受到严格限制，必须从每次观测中高效提取信息。我们提出\textbf{ALMAB-DC}——一种基于高斯过程的序贯设计框架，结合主动学习、多臂老虎机（MAB）与分布式异步计算，用于昂贵的黑箱实验。该框架通过具有不确定性感知的采集函数的高斯过程代理模型识别信息量最大的查询点；利用UCB或汤普森采样老虎机控制器在并行工作节点间分配评估任务；并采用异步调度器处理异构运行时间。我们给出了老虎机组件的累积遗憾界，并通过Amdahl定律表征了并行可扩展性。我们在五个基准上验证了ALMAB-DC。在两项统计实验设计任务中：剂量-反应优化中，ALMAB-DC的简单遗憾低于等间距设计、随机设计与D-最优设计；在自适应空间场估计中，达到贪婪最大方差基准的性能，同时优于拉丁超立方采样；在$K=4$的分布式设置中，其目标性能仅需序贯壁钟时钟周期的四分之一。在三项机器学习/工程任务（CIFAR-10超参数优化、CFD阻力最小化、MuJoCo强化学习）中，ALMAB-DC实现CIFAR-10测试精度93.4%（优于BOHB 1.7个百分点、优于Optuna 1.1个百分点），将翼型阻力降至$C_D = 0.059$（较网格搜索降低36.9%），并在强化学习奖励上较网格搜索提升50%。经Bonferroni校正的Mann–Whitney $U$检验表明，所有相对于非ALMAB基线的优势均具有统计显著性。在$K=16$智能体时，分布式执行实现$7.5\times$加速比，与Amdahl定律一致。