A prediction model is most useful if it generalizes beyond the development data with external validations, but to what extent should it generalize remains unclear. In practice, prediction models are externally validated using data from very different settings, including populations from other health systems or countries, with predictably poor results. This may not be a fair reflection of the performance of the model which was designed for a specific target population or setting, and may be stretching the expected model generalizability. To address this, we suggest to externally validate a model using new data from the target population to ensure clear implications of validation performance on model reliability, whereas model generalizability to broader settings should be carefully investigated during model development instead of explored post-hoc. Based on this perspective, we propose a roadmap that facilitates the development and application of reliable, fair, and trustworthy artificial intelligence prediction models.
翻译:预测模型若能在开发数据之外通过外部验证实现泛化,则最具实用性,但其应泛化至何种程度仍不明确。实践中,预测模型常使用差异极大的环境数据(包括来自其他医疗系统或国家的人群)进行外部验证,导致结果可预见地不佳。这对针对特定目标人群或场景设计的模型性能而言可能并非公平反映,且可能过度扩展了模型的预期泛化能力。为解决此问题,我们建议采用来自目标人群的新数据进行外部验证,以确保验证性能对模型可靠性的含义清晰明确;而模型向更广泛场景的泛化性应在开发过程中细致探究,而非事后追溯。基于这一视角,我们提出了一套路线图,旨在促进可靠、公平且可信的人工智能预测模型的开发与应用。