Dynamic Algorithm Configuration (DAC) addresses the challenge of dynamically setting hyperparameters of an algorithm for a diverse set of instances rather than focusing solely on individual tasks. Agents trained with Deep Reinforcement Learning (RL) offer a pathway to solve such settings. However, the limited generalization performance of these agents has significantly hindered the application in DAC. Our hypothesis is that a potential bias in the training instances limits generalization capabilities. We take a step towards mitigating this by selecting a representative subset of training instances to overcome overrepresentation and then retraining the agent on this subset to improve its generalization performance. For constructing the meta-features for the subset selection, we particularly account for the dynamic nature of the RL agent by computing time series features on trajectories of actions and rewards generated by the agent's interaction with the environment. Through empirical evaluations on the Sigmoid and CMA-ES benchmarks from the standard benchmark library for DAC, called DACBench, we discuss the potentials of our selection technique compared to training on the entire instance set. Our results highlight the efficacy of instance selection in refining DAC policies for diverse instance spaces.
翻译:动态算法配置(DAC)旨在解决为多样化实例集合动态设置算法超参数的挑战,而非仅针对单一任务。利用深度强化学习(RL)训练的智能体为解决此类问题提供了一条途径。然而,这些智能体有限的泛化性能严重阻碍了其在DAC中的应用。我们的假设是:训练实例中存在的潜在偏差限制了泛化能力。为缓解此问题,我们通过选取具有代表性的训练实例子集来克服过拟合问题,并在此子集上重新训练智能体以提升其泛化性能。在构建用于子集选择的元特征时,我们特别考虑了RL智能体的动态特性,通过计算智能体与环境交互产生的动作轨迹和奖励序列的时间序列特征来实现。通过在DAC标准基准库DACBench中的Sigmoid和CMA-ES基准上进行实证评估,我们探讨了该选择技术相较于在整个实例集上训练的优势。研究结果突显了实例选择在优化针对多样化实例空间的DAC策略方面的有效性。