In many data-driven decision-making problems, performance guarantees often depend heavily on the correctness of model assumptions, which may frequently fail in practice. We address this issue in the context of a feature-based newsvendor problem, where demand is influenced by observed features such as demographics and seasonality. To mitigate the impact of model misspecification, we propose a model-free and distribution-free framework inspired by conformal prediction. Our approach consists of two phases: a training phase, which can utilize any type of prediction method, and a calibration phase that conformalizes the model bias. To enhance predictive performance, we explore the balance between data quality and quantity, recognizing the inherent trade-off: more selective training data improves quality but reduces quantity. Importantly, we provide statistical guarantees for the conformalized critical quantile, independent of the correctness of the underlying model. Moreover, we quantify the confidence interval of the critical quantile, with its width decreasing as data quality and quantity improve. We validate our framework using both simulated data and a real-world dataset from the Capital Bikeshare program in Washington, D.C. Across these experiments, our proposed method consistently outperforms benchmark algorithms, reducing newsvendor loss by up to 40% on the simulated data and 25% on the real-world dataset.
翻译:在许多数据驱动的决策问题中,性能保证往往严重依赖于模型假设的正确性,而这些假设在实践中可能经常失效。我们在基于特征的新报童问题背景下探讨此问题,其中需求受人口统计特征和季节性等观测特征影响。为减轻模型误设的影响,我们提出一种受保形预测启发的无模型且无分布假设的框架。该方法包含两个阶段:训练阶段(可采用任何类型的预测方法)和校准阶段(用于保形化模型偏差)。为提升预测性能,我们探索数据质量与数量之间的平衡,认识到其固有的权衡关系:更精选的训练数据可提高质量但会减少数量。重要的是,我们为保形化的临界分位数提供了统计保证,且该保证独立于基础模型的正确性。此外,我们量化了临界分位数的置信区间,其宽度随数据质量和数量的提升而减小。我们通过模拟数据和华盛顿特区Capital Bikeshare项目的真实数据集验证了该框架。在所有实验中,所提方法始终优于基准算法,在模拟数据上将新报童损失降低达40%,在真实数据集上降低达25%。