Machine Learning (ML) models are increasingly used to support or substitute decision making. In applications where skilled experts are a limited resource, it is crucial to reduce their burden and automate decisions when the performance of an ML model is at least of equal quality. However, models are often pre-trained and fixed, while tasks arrive sequentially and their distribution may shift. In that case, the respective performance of the decision makers may change, and the deferral algorithm must remain adaptive. We propose a contextual bandit model of this online decision making problem. Our framework includes budget constraints and different types of partial feedback models. Beyond the theoretical guarantees of our algorithm, we propose efficient extensions that achieve remarkable performance on real-world datasets.
翻译:机器学习(ML)模型正日益被用于支持或替代决策过程。在专业专家资源有限的应用场景中,当ML模型的性能至少与专家决策质量相当时,减轻专家负担并实现决策自动化至关重要。然而,模型通常是预先训练且固定不变的,而任务却按序到达且其分布可能发生偏移。在这种情况下,决策者之间的相对性能可能发生变化,延迟决策算法必须保持自适应性。我们针对这一在线决策问题提出了一个上下文赌博机模型。该框架包含预算约束与多种部分反馈模型。除算法理论保证外,我们提出的高效扩展方法在真实数据集上取得了显著性能表现。