Many deployed systems expose black-box objectives whose minimizing configuration shifts with an externally observed context. When contexts revisit a small set of latent regimes, an optimizer that discards history pays repeated adaptation cost; when each step must remain inexpensive, full Gaussian-process (GP) refits at high observation counts are difficult to sustain. We cast online tuning as context-conditioned regret minimization and present RASP-Tuner, which instantiates a decomposition motivated by first principles: (i) identify a regime proxy by retrieving similar past contexts; (ii) predict short-horizon loss with a mixture-of-experts surrogate whose input concatenates parameters, context, and a retrieved soft prompt; (iii) adapt chiefly in a low-dimensional prompt subspace, invoking full surrogate updates only when scalarized error or disagreement spikes. A RealErrorComposer maps heterogeneous streaming metrics to [0,1] via EMA-stabilized logistic scores, supplying a single differentiable training target. On nine synthetic non-stationary benchmarks, an adversarial-context sanity check, and three tabular real-world streams (Section on real-world experiments), RASP-Tuner improves or matches cumulative regret relative to our GP-UCB and CMA-ES implementations on seven of nine synthetic tasks under paired tests at horizon T=100, while recording 8-12 times lower wall-clock per step than sliding-window GP-UCB on identical hardware. Idealized analysis in a cluster-separated, strongly convex regime model (RA-GD) supplies sufficient conditions for bounded dynamic regret; the deployed pipeline violates several of these premises, and we articulate which gaps remain open.
翻译:摘要:许多已部署系统暴露出的黑箱目标函数,其最小化配置会随外部观测到的上下文环境变化。当上下文访问少量潜在模式时,忽略历史信息的优化器将重复付出适应代价;当每一步优化必须保持低成本时,在高观测计数下完整重拟高斯过程(GP)模型难以持续。我们将在线调参建模为上下文条件化遗憾最小化问题,并提出RASP-Tuner方法,其实现了基于基本原理的分解框架:(i)通过检索相似历史上下文识别模态代理;(ii)采用混合专家代理模型预测短期损失,该模型输入由参数、上下文及检索得到的软提示拼接而成;(iii)主要在低维度提示子空间内进行自适应调整,仅当标量化误差或分歧激增时才触发完整的代理更新。RealErrorComposer组件通过EMA稳定化的逻辑得分将异构流式指标映射至[0,1]区间,提供单一可微训练目标。在九个合成非平稳基准测试、一个对抗性上下文完整性检验及三个表格型真实数据流(详见真实实验章节)中,与基于GP-UCB和CMA-ES的对比方案相比,RASP-Tuner在T=100时间步长配对检验下,于九个合成任务中的七个上改善或匹配了累积遗憾值,同时在相同硬件条件下实现了每步计算时间较滑动窗口GP-UCB降低8-12倍。在类簇分离强凸回归模型(RA-GD)的理想化分析中,我们给出了有界动态遗憾的充分条件;实际部署流水线违反了其中多项假设,本文明确阐述了尚待解决的理论缺口。