Data drift is the change in model input data that is one of the key factors leading to machine learning models performance degradation over time. Monitoring drift helps detecting these issues and preventing their harmful consequences. Meaningful drift interpretation is a fundamental step towards effective re-training of the model. In this study we propose an end-to-end framework for reliable model-agnostic change-point detection and interpretation in large task-oriented dialog systems, proven effective in multiple customer deployments. We evaluate our approach and demonstrate its benefits with a novel variant of intent classification training dataset, simulating customer requests to a dialog system. We make the data publicly available.
翻译:数据漂移是指模型输入数据随时间变化的现象,这是导致机器学习模型性能逐步下降的关键因素之一。监测漂移有助于发现此类问题并防止其产生有害后果。有意义的漂移解释是有效重新训练模型的基础步骤。本研究提出了一种端到端框架,用于在大型面向任务的对话系统中实现可靠的、与模型无关的变化点检测与解释,该框架已在多个客户部署中证明有效。我们评估了该方法,并使用一种新型意图分类训练数据集(模拟客户对对话系统的请求)展示了其优势。该数据集已公开发布。