Data scarcity challenges the development and implementation of innovative healthcare solutions. In geriatrics, fall-related injuries are a major cause of hospitalization, functional decline, and mortality in older adults. Optimizing post-operative discharge planning can mitigate these outcomes, but limited data hinders predictive model development. Here, we explored generative machine learning approaches to augment data from the SURGE-Ahead project (Supporting SURgery with Geriatric Co-Management and AI), an initiative addressing geriatric perioperative care. Data from the German geriatric trauma register (AltersTraumaZentrum; ATZ) were incorporated using two strategies: (i) combining SURGE-Ahead and ATZ register data with imputation (ComImp) and (ii) generating synthetic data from SURGE-Ahead alone or combined SURGE-Ahead and the ATZ register datasets with Adversarial random forests (ARF). Predictive models, including multinomial logistic regression, random forest, and a prior-fitted transformer (TabPFN), were trained and evaluated using standard performance metrics: accuracy, area under the receiver operating characteristic curve (ROC AUC), Brier score, and the logistic loss. Random forest and TabPFN performed well (accuracy around 0.84 and AUC around 0.94) and were largely unaffected by augmentation. Logistic regression benefited from augmented data, with predictive performance improving from 0.70 to 0.81 for accuracy and 0.85 to 0.92 for AUC. These results highlight generative data augmentation as a viable approach to enhance simpler predictive models in geriatric care and emphasize the importance of method selection when addressing data scarcity in heterogeneous clinical populations.
翻译:数据稀缺性制约了创新医疗解决方案的开发与实施。在老年医学领域,跌倒相关损伤是导致老年人住院、功能衰退及死亡的主要原因。优化术后出院规划可减轻上述不良结局,但有限的数据阻碍了预测模型的发展。本研究探索采用生成式机器学习方法,对SURGE-Ahead项目(老年围术期协同管理与人工智能支持手术项目)的数据进行增强。通过两种策略整合德国老年创伤登记册(AltersTraumaZentrum;ATZ)数据:(i)结合SURGE-Ahead与ATZ登记数据并采用插补法(ComImp);(ii)分别基于单一SURGE-Ahead数据或结合SURGE-Ahead与ATZ登记数据集,采用对抗性随机森林(ARF)生成合成数据。预测模型包括多项逻辑回归、随机森林及先验拟合Transformer(TabPFN),通过标准性能指标(准确率、受试者工作特征曲线下面积(ROC AUC)、Brier分数及对数损失)进行训练与评估。随机森林与TabPFN表现优异(准确率约0.84,AUC约0.94),且增强效果对其影响有限。逻辑回归在数据增强后获益,预测性能从准确率0.70/AUC 0.85提升至0.81/0.92。研究结果揭示,生成式数据增强是增强老年护理中简单预测模型可行方法,并强调在异质性临床群体中解决数据稀缺问题时方法选择的重要性。