The performance of decision policies and prediction models often deteriorates when applied to environments different from the ones seen during training. To ensure reliable operation, we analyze the stability of a system under distribution shift, which is defined as the smallest change in the underlying environment that causes the system's performance to deteriorate beyond a permissible threshold. In contrast to standard tail risk measures and distributionally robust losses that require the specification of a plausible magnitude of distribution shift, the stability measure is defined in terms of a more intuitive quantity: the level of acceptable performance degradation. We develop a minimax optimal estimator of stability and analyze its convergence rate, which exhibits a fundamental phase shift behavior. Our characterization of the minimax convergence rate shows that evaluating stability against large performance degradation incurs a statistical cost. Empirically, we demonstrate the practical utility of our stability framework by using it to compare system designs on problems where robustness to distribution shift is critical.
翻译:决策策略和预测模型在应用于与训练环境不同的场景时,其性能往往会下降。为确保系统可靠运行,我们分析了分布偏移下的系统稳定性——该指标定义为导致系统性能下降超过允许阈值所需的最小底层环境变化。与需要指定分布偏移合理幅度的标准尾部风险度量及分布鲁棒性损失函数不同,稳定性度量是通过更直观的量来定义的:可接受的性能退化程度。我们构建了稳定性的极小极大最优估计量,并分析了其收敛速率,该速率呈现出根本性的相变行为。我们对极小极大收敛速率的刻画表明,评估系统对抗大幅性能退化的稳定性会产生统计代价。通过将本稳定性框架应用于分布偏移鲁棒性至关重要的实际问题,我们实证展示了其在比较系统设计方案方面的实用价值。