Scalarization is a general, parallizable technique that can be deployed in any multiobjective setting to reduce multiple objectives into one, yet some have dismissed this versatile approach because linear scalarizations cannot explore concave regions of the Pareto frontier. To that end, we aim to find simple non-linear scalarizations that provably explore a diverse set of $k$ objectives on the Pareto frontier, as measured by the dominated hypervolume. We show that hypervolume scalarizations with uniformly random weights achieves an optimal sublinear hypervolume regret bound of $O(T^{-1/k})$, with matching lower bounds that preclude any algorithm from doing better asymptotically. For the setting of multiobjective stochastic linear bandits, we utilize properties of hypervolume scalarizations to derive a novel non-Euclidean analysis to get regret bounds of $\tilde{O}( d T^{-1/2} + T^{-1/k})$, removing unnecessary $\text{poly}(k)$ dependencies. We support our theory with strong empirical performance of using non-linear scalarizations that outperforms both their linear counterparts and other standard multiobjective algorithms in a variety of natural settings.
翻译:标量化是一种通用且可并行化的技术,可在任何多目标场景中部署,将多个目标简化为单一目标。然而,由于线性标量化无法探索帕累托前沿的凹区域,一些人已摒弃这种多功能方法。为此,我们旨在寻找简单的非线性标量化方法,这些方法能够被证明可以探索帕累托前沿上由支配超体积度量的多样化 $k$ 个目标。我们证明,采用均匀随机权重的超体积标量化能够实现 $O(T^{-1/k})$ 的最优次线性超体积遗憾界,并且匹配的下界表明任何算法在渐近意义上都无法做得更好。针对多目标随机线性老虎机设置,我们利用超体积标量化的性质,通过一种新颖的非欧几里得分析推导出 $\tilde{O}( d T^{-1/2} + T^{-1/k})$ 的遗憾界,从而消除了不必要的 $\text{poly}(k)$ 依赖项。我们通过非线性标量化在各种自然设置中优于其线性对应方法及其他标准多目标算法的强大实证性能来支持我们的理论。