The Cancer Registry of Norway (CRN) collects information on cancer patients by receiving cancer messages from different medical entities (e.g., medical labs, and hospitals) in Norway. Such messages are validated by an automated cancer registry system: GURI. Its correct operation is crucial since it lays the foundation for cancer research and provides critical cancer-related statistics to its stakeholders. Constructing a cyber-cyber digital twin (CCDT) for GURI can facilitate various experiments and advanced analyses of the operational state of GURI without requiring intensive interactions with the real system. However, GURI constantly evolves due to novel medical diagnostics and treatment, technological advances, etc. Accordingly, CCDT should evolve as well to synchronize with GURI. A key challenge of achieving such synchronization is that evolving CCDT needs abundant data labelled by the new GURI. To tackle this challenge, we propose EvoCLINICAL, which considers the CCDT developed for the previous version of GURI as the pretrained model and fine-tunes it with the dataset labelled by querying a new GURI version. EvoCLINICAL employs a genetic algorithm to select an optimal subset of cancer messages from a candidate dataset and query GURI with it. We evaluate EvoCLINICAL on three evolution processes. The precision, recall, and F1 score are all greater than 91%, demonstrating the effectiveness of EvoCLINICAL. Furthermore, we replace the active learning part of EvoCLINICAL with random selection to study the contribution of transfer learning to the overall performance of EvoCLINICAL. Results show that employing active learning in EvoCLINICAL increases its performances consistently.
翻译:挪威癌症登记中心(CRN)通过接收来自挪威各医疗机构(如医学实验室和医院)的癌症消息,收集癌症患者信息。这些消息由自动化癌症登记系统GURI进行验证。由于该系统的正确运行为癌症研究奠定基础,并向利益相关方提供关键癌症统计数据,因此其正常运行至关重要。构建GURI的网络-网络数字孪生(CCDT)可在无需频繁与真实系统交互的前提下,开展关于GURI运行状态的各种实验与高级分析。然而,GURI因新型医疗诊断与治疗方法、技术进步等因素持续进化。相应地,CCDT也需要同步进化以与GURI保持一致。实现这种同步的关键挑战在于,进化中的CCDT需要大量由新版本GURI标注的数据。为解决这一难题,我们提出EvoCLINICAL方法:将针对GURI旧版本构建的CCDT视为预训练模型,通过查询新版本GURI标注的数据集对其进行微调。EvoCLINICAL采用遗传算法从候选数据集中选择最优癌症消息子集,并用该子集查询GURI。我们在三个进化过程中评估了EvoCLINICAL性能,其精确率、召回率和F1分数均超过91%,验证了该方法的有效性。此外,通过将EvoCLINICAL中的主动学习模块替换为随机选择,我们研究了迁移学习对EvoCLINICAL整体性能的贡献。结果表明,在EvoCLINICAL中应用主动学习可稳定提升其性能。