With the increasingly widespread adoption of AI in healthcare, maintaining the accuracy and reliability of AI models in clinical practice has become crucial. In this context, we introduce novel methods for monitoring the performance of radiology AI classification models in practice, addressing the challenges of obtaining real-time ground truth for performance monitoring. We propose two metrics - predictive divergence and temporal stability - to be used for preemptive alerts of AI performance changes. Predictive divergence, measured using Kullback-Leibler and Jensen-Shannon divergences, evaluates model accuracy by comparing predictions with those of two supplementary models. Temporal stability is assessed through a comparison of current predictions against historical moving averages, identifying potential model decay or data drift. This approach was retrospectively validated using chest X-ray data from a single-center imaging clinic, demonstrating its effectiveness in maintaining AI model reliability. By providing continuous, real-time insights into model performance, our system ensures the safe and effective use of AI in clinical decision-making, paving the way for more robust AI integration in healthcare
翻译:随着人工智能在医疗领域的广泛应用,确保AI模型在临床实践中的准确性与可靠性已成为关键课题。针对临床实践中实时获取真实标注数据以监测模型性能的挑战,我们提出了创新的放射学AI分类模型性能监控方法。我们引入两项新指标——预测发散性与时间稳定性——用于对AI性能变化进行预警性监测。预测发散性通过KL散度与JS散度进行量化,通过比较模型预测与两个辅助模型的输出结果评估模型准确度;时间稳定性则通过当前预测值与历史移动平均值的对比分析,识别潜在模型退化或数据漂移现象。该方案基于单中心影像诊所的胸部X光数据进行回顾性验证,证实其能有效维持AI模型可靠性。通过提供持续实时的模型性能洞察,本系统保障了AI在临床决策中的安全有效应用,为医疗领域更稳健的AI融合铺平了道路。