Mathematics of statistical sequential decision-making: concentration, risk-awareness and modelling in stochastic bandits, with applications to bariatric surgery

赌博机/老虎机 · 统计量 · MoDELS · Bandits · Learning ·

2024 年 5 月 3 日

翻译：统计序贯决策的数学：随机赌博机中的集中性、风险感知与建模及其在减重手术中的应用

Patrick Saux

from arxiv, Doctoral thesis. Some pdf readers (e.g. Firefox) have trouble rendering the theorems/definitions environment. When reading online, please prefer e.g. Chrome

This thesis aims to study some of the mathematical challenges that arise in the analysis of statistical sequential decision-making algorithms for postoperative patients follow-up. Stochastic bandits (multiarmed, contextual) model the learning of a sequence of actions (policy) by an agent in an uncertain environment in order to maximise observed rewards. To learn optimal policies, bandit algorithms have to balance the exploitation of current knowledge and the exploration of uncertain actions. Such algorithms have largely been studied and deployed in industrial applications with large datasets, low-risk decisions and clear modelling assumptions, such as clickthrough rate maximisation in online advertising. By contrast, digital health recommendations call for a whole new paradigm of small samples, risk-averse agents and complex, nonparametric modelling. To this end, we developed new safe, anytime-valid concentration bounds, (Bregman, empirical Chernoff), introduced a new framework for risk-aware contextual bandits (with elicitable risk measures) and analysed a novel class of nonparametric bandit algorithms under weak assumptions (Dirichlet sampling). In addition to the theoretical guarantees, these results are supported by in-depth empirical evidence. Finally, as a first step towards personalised postoperative follow-up recommendations, we developed with medical doctors and surgeons an interpretable machine learning model to predict the long-term weight trajectories of patients after bariatric surgery.

翻译：本论文旨在研究术后患者随访的统计序贯决策算法分析中出现的若干数学挑战。随机赌博机（多臂、情境式）模拟智能体在不确定环境中学习一系列行动（策略）以最大化观测收益的过程。为学习最优策略，赌博机算法需平衡当前知识的利用与不确定行动的探索。此类算法已在具有大数据集、低风险决策和清晰建模假设（如在线广告中的点击率最大化）的工业应用中得到广泛研究与应用。然而，数字健康建议要求全新的范式：小样本、风险厌恶型智能体及复杂的非参数建模。为此，我们开发了新的安全、即时有效的集中性界限（Bregman、经验Chernoff），引入了风险感知情境赌博机的新框架（基于可激发风险度量），并在弱假设下分析了新型非参数赌博机算法类别（狄利克雷抽样）。除理论保证外，这些结果还得到深度经验证据的支持。最后，作为实现个性化术后随访建议的第一步，我们与医学专家及外科医生合作，开发了可解释的机器学习模型，用于预测减重手术后患者的长期体重变化轨迹。