To address the needs of modeling uncertainty in sensitive machine learning applications, the setup of distributionally robust optimization (DRO) seeks good performance uniformly across a variety of tasks. The recent multi-distribution learning (MDL) framework tackles this objective in a dynamic interaction with the environment, where the learner has sampling access to each target distribution. Drawing inspiration from the field of pure-exploration multi-armed bandits, we provide distribution-dependent guarantees in the MDL regime, that scale with suboptimality gaps and result in superior dependence on the sample size when compared to the existing distribution-independent analyses. We investigate two non-adaptive strategies, uniform and non-uniform exploration, and present non-asymptotic regret bounds using novel tools from empirical process theory. Furthermore, we devise an adaptive optimistic algorithm, LCB-DR, that showcases enhanced dependence on the gaps, mirroring the contrast between uniform and optimistic allocation in the multi-armed bandit literature.
翻译:为应对敏感机器学习应用中建模不确定性的需求,分布稳健优化(DRO)的设置旨在跨多种任务实现一致的优良性能。近期提出的多分布学习(MDL)框架通过与环境动态交互来实现这一目标,其中学习器可对每个目标分布进行采样访问。受纯探索多臂老虎机领域的启发,我们在MDL框架中提供了基于分布的保证,该保证随次优性差距缩放,相较于现有基于分布的分析方法,在样本量依赖方面展现出更优性能。我们研究了两种非自适应策略——均匀探索与非均匀探索,并利用经验过程理论的新工具给出了非渐近遗憾界。此外,我们设计了一种自适应乐观算法LCB-DR,该算法展现出对差距更优的依赖性,与多臂老虎机文献中均匀分配与乐观分配的对比相呼应。