This work addresses challenges in evaluating adaptive artificial intelligence (AI) models for medical devices, where iterative updates to both models and evaluation datasets complicate performance assessment. We introduce a novel approach with three complementary measurements: learning (model improvement on current data), potential (dataset-driven performance shifts), and retention (knowledge preservation across modification steps), to disentangle performance changes caused by model adaptations versus dynamic environments. Case studies using simulated population shifts demonstrate the approach's utility: gradual transitions enable stable learning and retention, while rapid shifts reveal trade-offs between plasticity and stability. These measurements provide practical insights for regulatory science, enabling rigorous assessment of the safety and effectiveness of adaptive AI systems over sequential modifications.
翻译:本文针对医疗器械中自适应人工智能(AI)模型评估面临的挑战展开研究,其中模型与评估数据集的迭代更新使性能评估复杂化。我们提出一种包含三项互补测量的新方法:学习(模型对当前数据的改进能力)、潜力(数据集驱动的性能偏移)以及保持(跨修改步骤的知识保留能力),以解耦模型自适应与动态环境导致的性能变化。通过模拟人群偏移的案例研究验证了该方法实用性:渐进式转变可实现稳定的学习与保持,而快速转变则揭示了可塑性-稳定性的权衡关系。这些测量方法为监管科学提供了实用洞见,能够对自适应AI系统在序列修改过程中的安全性与有效性进行严格评估。