RASP-Tuner: Retrieval-Augmented Soft Prompts for Context-Aware Black-Box Optimization in Non-Stationary Environments

Many deployed systems expose black-box objectives whose minimizing configuration shifts with an externally observed context. When contexts revisit a small set of latent regimes, an optimizer that discards history pays repeated adaptation cost; when each step must remain inexpensive, full Gaussian-process (GP) refits at high observation counts are difficult to sustain. We cast online tuning as context-conditioned regret minimization and present RASP-Tuner, which instantiates a decomposition motivated by first principles: (i) identify a regime proxy by retrieving similar past contexts; (ii) predict short-horizon loss with a mixture-of-experts surrogate whose input concatenates parameters, context, and a retrieved soft prompt; (iii) adapt chiefly in a low-dimensional prompt subspace, invoking full surrogate updates only when scalarized error or disagreement spikes. A RealErrorComposer maps heterogeneous streaming metrics to [0,1] via EMA-stabilized logistic scores, supplying a single differentiable training target. On nine synthetic non-stationary benchmarks, an adversarial-context sanity check, and three tabular real-world streams (Section on real-world experiments), RASP-Tuner improves or matches cumulative regret relative to our GP-UCB and CMA-ES implementations on seven of nine synthetic tasks under paired tests at horizon T=100, while recording 8-12 times lower wall-clock per step than sliding-window GP-UCB on identical hardware. Idealized analysis in a cluster-separated, strongly convex regime model (RA-GD) supplies sufficient conditions for bounded dynamic regret; the deployed pipeline violates several of these premises, and we articulate which gaps remain open.

翻译：摘要：许多已部署系统暴露出的黑箱目标函数，其最小化配置会随外部观测到的上下文环境变化。当上下文访问少量潜在模式时，忽略历史信息的优化器将重复付出适应代价；当每一步优化必须保持低成本时，在高观测计数下完整重拟高斯过程（GP）模型难以持续。我们将在线调参建模为上下文条件化遗憾最小化问题，并提出RASP-Tuner方法，其实现了基于基本原理的分解框架：（i）通过检索相似历史上下文识别模态代理；（ii）采用混合专家代理模型预测短期损失，该模型输入由参数、上下文及检索得到的软提示拼接而成；（iii）主要在低维度提示子空间内进行自适应调整，仅当标量化误差或分歧激增时才触发完整的代理更新。RealErrorComposer组件通过EMA稳定化的逻辑得分将异构流式指标映射至[0,1]区间，提供单一可微训练目标。在九个合成非平稳基准测试、一个对抗性上下文完整性检验及三个表格型真实数据流（详见真实实验章节）中，与基于GP-UCB和CMA-ES的对比方案相比，RASP-Tuner在T=100时间步长配对检验下，于九个合成任务中的七个上改善或匹配了累积遗憾值，同时在相同硬件条件下实现了每步计算时间较滑动窗口GP-UCB降低8-12倍。在类簇分离强凸回归模型（RA-GD）的理想化分析中，我们给出了有界动态遗憾的充分条件；实际部署流水线违反了其中多项假设，本文明确阐述了尚待解决的理论缺口。