When a predictive model is in production, it must be monitored in real-time to ensure that its performance does not suffer due to drift or abrupt changes to data. Ideally, this is done long before learning that the performance of the model itself has dropped by monitoring outcome data. In this paper we consider the problem of monitoring a predictive model that identifies the need for palliative care currently in production at the Mayo Clinic in Rochester, MN. We introduce a framework, called \textit{Bayes Watch}, for detecting change-points in high-dimensional longitudinal data with mixed variable types and missing values and for determining in which variables the change-point occurred. Bayes Watch fits an array of Gaussian Graphical Mixture Models to groupings of homogeneous data in time, called regimes, which are modeled as the observed states of a Markov process with unknown transition probabilities. In doing so, Bayes Watch defines a posterior distribution on a vector of regime assignments, which gives meaningful expressions on the probability of every possible change-point. Bayes Watch also allows for an effective and efficient fault detection system that assesses what features in the data where the most responsible for a given change-point.
翻译:当预测模型投入生产后,必须对其进行实时监控,以确保其性能不会因数据漂移或突变而下降。理想情况下,应在通过结果数据发现模型本身性能下降之前尽早完成监控。本文研究了对梅奥诊所(明尼苏达州罗切斯特)当前用于识别姑息治疗需求的预测模型进行监控的问题。我们提出一种名为\textit{Bayes Watch}的框架,用于检测高维纵向数据中混合变量类型与缺失值条件下的变点,并确定变点发生的具体变量。Bayes Watch通过将时间上均匀的数据分组(称为状态区间)拟合为高斯图形混合模型阵列,并将这些状态区间建模为具有未知转移概率的马尔可夫过程的可观测状态。该方法在状态区间分配向量上定义了后验分布,从而为每个可能的变点概率提供有意义的量化表达。同时,Bayes Watch还实现了一种高效故障检测系统,能够评估数据特征中导致特定变点的主要因素。