In this work, we explore a framework for contextual decision-making to study how the relevance and quantity of past data affects the performance of a data-driven policy. We analyze a contextual Newsvendor problem in which a decision-maker needs to trade-off between an underage and an overage cost in the face of uncertain demand. We consider a setting in which past demands observed under ``close by'' contexts come from close by distributions and analyze the performance of data-driven algorithms through a notion of context-dependent worst-case expected regret. We analyze the broad class of Weighted Empirical Risk Minimization (WERM) policies which weigh past data according to their similarity in the contextual space. This class includes classical policies such as ERM, k-Nearest Neighbors and kernel-based policies. Our main methodological contribution is to characterize exactly the worst-case regret of any WERM policy on any given configuration of contexts. To the best of our knowledge, this provides the first understanding of tight performance guarantees in any contextual decision-making problem, with past literature focusing on upper bounds via concentration inequalities. We instead take an optimization approach, and isolate a structure in the Newsvendor loss function that allows to reduce the infinite-dimensional optimization problem over worst-case distributions to a simple line search. This in turn allows us to unveil fundamental insights that were obfuscated by previous general-purpose bounds. We characterize actual guaranteed performance as a function of the contexts, as well as granular insights on the learning curve of algorithms.
翻译:本文探讨了一种上下文决策框架,旨在研究历史数据的相关性与数量如何影响数据驱动策略的性能表现。我们分析了决策者在需求不确定下面临缺货成本与滞销成本权衡的上下文报童问题。研究设定中,在"邻近"上下文条件下观测到的历史需求服从相近分布,并通过上下文相关的损失函数最坏情况期望遗憾来评估数据驱动算法的性能。我们分析了加权经验风险最小化(WERM)策略的广泛类别,该类策略根据历史数据在上下文空间中的相似性进行加权,涵盖经典策略如经验风险最小化(ERM)、k近邻及核方法。主要方法论贡献在于精确表征了任意WERM策略在给定上下文配置下的最坏情况遗憾值。据我们所知,这是首个在上下文决策问题中提供严密性能保证的理论成果——此前文献仅通过集中不等式给出上界。我们转而采用优化方法,发掘报童损失函数的结构特性,从而将最坏情况分布上的无穷维优化问题简化为简单的线性搜索。这使我们得以揭示此前通用边界所掩盖的基础性见解。我们不仅表征了实际保证性能随上下文变化的关系,还获得了算法学习曲线的精细化洞察。