Distributionally robust optimisation (DRO) minimises the worst-case expected loss over an ambiguity set that can capture distributional shifts in out-of-sample environments. While Huber (linear-vacuous) contamination is a classical minimal-assumption model for an $\varepsilon$-fraction of arbitrary perturbations, including it in an ambiguity set can make the worst-case risk infinite and the DRO objective vacuous unless one imposes strong boundedness or support assumptions. We address these challenges by introducing bulk-calibrated credal ambiguity sets: we learn a high-mass bulk set from data while considering contamination inside the bulk and bounding the remaining tail contribution separately. This leads to a closed-form, finite $\mathrm{mean}+\sup$ robust objective and tractable linear or second-order cone programs for common losses and bulk geometries. Through this framework, we highlight and exploit the equivalence between the imprecise probability (IP) notion of upper expectation and the worst-case risk, demonstrating how IP credal sets translate into DRO objectives with interpretable tolerance levels. Experiments on heavy-tailed inventory control, geographically shifted house-price regression, and demographically shifted text classification show competitive robustness-accuracy trade-offs and efficient optimisation times, using Bayesian, frequentist, or empirical reference distributions.
翻译:分布鲁棒优化(DRO)通过最小化模糊集上的最坏情况期望损失,以捕捉样本外环境中的分布偏移。虽然Huber(线性-空)污染是用于任意扰动的ε-分数的一种经典最小假设模型,但将其纳入模糊集可能导致最坏情况风险无限大,并使DRO目标失去意义,除非施加强有界性或支撑集假设。我们通过引入批量校准信度模糊集来解决这些挑战:我们从数据中学习一个高概率质量的批量集,同时考虑批量内部的污染,并单独限定剩余尾部贡献。这导出了一个闭式、有限的“均值+上确界”鲁棒目标,并为常见损失函数和批量几何结构提供了可处理的线性或二阶锥规划。通过此框架,我们强调并利用了不精确概率(IP)的上期望概念与最坏情况风险之间的等价性,展示了IP信度集如何转化为具有可解释容忍水平的DRO目标。在重尾库存控制、地理偏移的房价回归以及人口统计学偏移的文本分类上的实验表明,该方法在使用贝叶斯、频率主义或经验参考分布时,实现了具有竞争力的鲁棒性-准确性权衡和高效的优化时间。