Contextual learning seeks to learn a decision policy that maps an individual's characteristics to an action through data collection. In operations management, such data may come from various sources, and a central question is when data collection can stop while still guaranteeing that the learned policy is sufficiently accurate. We study this question under two precision criteria: a context-wise criterion and an aggregate policy-value criterion. We develop unified stopping rules for contextual learning with unknown sampling variances in both unstructured and structured linear settings. Our approach is based on generalized likelihood ratio (GLR) statistics for pairwise action comparisons. To calibrate the corresponding sequential boundaries, we derive new time-uniform deviation inequalities that directly control the self-normalized GLR evidence and thus avoid the conservativeness caused by decoupling mean and variance uncertainty. Under the Gaussian sampling model, we establish finite-sample precision guarantees for both criteria. Numerical experiments on synthetic instances and two case studies demonstrate that the proposed stopping rules achieve the target precision with substantially fewer samples than benchmark methods. The proposed framework provides a practical way to determine when enough information has been collected in personalized decision problems. It applies across multiple data-collection environments, including historical datasets, simulation models, and real systems, enabling practitioners to reduce unnecessary sampling while maintaining a desired level of decision quality.
翻译:上下文学习试图通过数据收集学习一个决策策略,将个体特征映射到行动。在运营管理中,此类数据可能来自多种来源,核心问题在于何时可以停止数据收集,同时确保所学策略具有足够精度。我们基于两种精度准则研究该问题:上下文准则和聚合策略价值准则。针对非结构化和结构化线性设定中采样方差未知的情形,我们开发了统一的上下文学习停止规则。该方法基于成对行动比较的广义似然比(GLR)统计量。为校准相应的序贯边界,我们推导了新的时间均匀偏差不等式,该不等式直接控制自归一化GLR证据,从而避免因解耦均值与方差不确定性导致的保守性。在高斯采样模型下,我们为两种准则建立了有限样本精度保证。基于合成实例和两项案例研究的数值实验表明,所提出的停止规则能以显著少于基准方法的样本量达到目标精度。该框架为个性化决策问题中确定何时收集足够信息提供了实用方案,适用于历史数据集、仿真模型及实际系统等多种数据收集环境,使从业者能够在维持期望决策质量的同时减少不必要的采样。