Scalarization is a general, parallizable technique that can be deployed in any multiobjective setting to reduce multiple objectives into one, yet some have dismissed this versatile approach because linear scalarizations cannot explore concave regions of the Pareto frontier. To that end, we aim to find simple non-linear scalarizations that provably explore a diverse set of $k$ objectives on the Pareto frontier, as measured by the dominated hypervolume. We show that hypervolume scalarizations with uniformly random weights achieves an optimal sublinear hypervolume regret bound of $O(T^{-1/k})$, with matching lower bounds that preclude any algorithm from doing better asymptotically. For the setting of multiobjective stochastic linear bandits, we utilize properties of hypervolume scalarizations to derive a novel non-Euclidean analysis to get regret bounds of $\tilde{O}( d T^{-1/2} + T^{-1/k})$, removing unnecessary $\text{poly}(k)$ dependencies. We support our theory with strong empirical performance of using non-linear scalarizations that outperforms both their linear counterparts and other standard multiobjective algorithms in a variety of natural settings.
翻译:标量化是一种通用且可并行的技术,可应用于任何多目标场景以将多个目标简化为单一目标,然而有人因其线性形式无法探索帕累托前沿的凹区域而摒弃了这一灵活方法。为此,我们致力于寻找简单的非线性标量化函数,使其在理论上能够探索帕累托前沿上由支配超体积度量的$k$个目标的多样化集合。我们证明:采用均匀随机权重的超体积标量化方法可达到$O(T^{-1/k})$的最优亚线性超体积遗憾界,同时匹配的下界表明任何算法在渐近意义上都无法取得更优结果。针对多目标随机线性赌博机场景,我们利用超体积标量化特性推导出新颖的非欧几里得分析,获得$\tilde{O}( d T^{-1/2} + T^{-1/k})$的遗憾界,从而消除了不必要的$\text{poly}(k)$依赖项。我们通过实验验证了理论结果:在多种自然场景中,非线性标量化方法的实际性能显著优于线性标量化及其他标准多目标算法。