We consider a simulation optimization problem for a context-dependent decision-making, which aims to determine the top-m designs for all contexts. Under a Bayesian framework, we formulate the optimal dynamic sampling decision as a stochastic dynamic programming problem, and develop a sequential sampling policy to efficiently learn the performance of each design under each context. The asymptotically optimal sampling ratios are derived to attain the optimal large deviations rate of the worst-case of probability of false selection. The proposed sampling policy is proved to be consistent and its asymptotic sampling ratios are asymptotically optimal. Numerical experiments demonstrate that the proposed method improves the efficiency for selection of top-m context-dependent designs.
翻译:我们考虑一个面向上下文相关决策的仿真优化问题,旨在为所有上下文确定前m个最优设计方案。在贝叶斯框架下,我们将最优动态采样决策建模为随机动态规划问题,并开发了一种序贯采样策略,以高效学习每个设计方案在不同上下文下的性能。推导了渐近最优采样比,以获得最坏情况下误选概率的最优大偏差率。本文提出的采样策略被证明具有一致性,且其渐近采样比达到渐近最优。数值实验表明,所提方法提升了选择前m个上下文相关设计方案的效率。