The Effect of Epidemiological Cohort Creation on the Machine Learning Prediction of Homelessness and Police Interaction Outcomes Using Administrative Health Care Data

INTERACT · 可辨认的 · Machine Learning · Performer · xgboost ·

2023 年 7 月 20 日

翻译：流行病学队列构建对利用行政医疗数据通过机器学习预测无家可归与警察互动结局的影响

Faezehsadat Shahidi,M. Ethan MacDonald,Dallas Seitz,Geoffrey Messier

from arxiv, to be published in Frontiers in Digital Health, Health Informatics

Background: Mental illness can lead to adverse outcomes such as homelessness and police interaction and understanding of the events leading up to these adverse outcomes is important. Predictive models may help identify individuals at risk of such adverse outcomes. Using a fixed observation window cohort with logistic regression (LR) or machine learning (ML) models can result in lower performance when compared with adaptive and parcellated windows. Method: An administrative healthcare dataset was used, comprising of 240,219 individuals in Calgary, Alberta, Canada who were diagnosed with addiction or mental health (AMH) between April 1, 2013, and March 31, 2018. The cohort was followed for 2 years to identify factors associated with homelessness and police interactions. To understand the benefit of flexible windows to predictive models, an alternative cohort was created. Then LR and ML models, including random forests (RF), and extreme gradient boosting (XGBoost) were compared in the two cohorts. Results: Among 237,602 individuals, 0.8% (1,800) experienced first homelessness, while 0.32% (759) reported initial police interaction among 237,141 individuals. Male sex (AORs: H=1.51, P=2.52), substance disorder (AORs: H=3.70, P=2.83), psychiatrist visits (AORs: H=1.44, P=1.49), and drug abuse (AORs: H=2.67, P=1.83) were associated with initial homelessness (H) and police interaction (P). XGBoost showed superior performance using the flexible method (sensitivity =91%, AUC =90% for initial homelessness, and sensitivity =90%, AUC=89% for initial police interaction) Conclusion: This study identified key features associated with initial homelessness and police interaction and demonstrated that flexible windows can improve predictive modeling.

翻译：背景：精神疾病可能导致无家可归和警察互动等不良结局，了解相关事件发生前的因素具有重要意义。预测模型有助于识别具有此类不良结局风险的个体。与自适应和分段窗口相比，采用固定观察窗口的队列结合逻辑回归（LR）或机器学习（ML）模型可能导致性能下降。方法：使用加拿大阿尔伯塔省卡尔加里市于2013年4月1日至2018年3月31日期间被诊断为成瘾或心理健康（AMH）问题的240,219例个体的行政医疗数据集。队列随访2年，以识别与无家可归和警察互动相关的因素。为评估灵活窗口对预测模型的益处，另构建了一个替代队列。随后，在两组队列中比较了LR与ML模型（包括随机森林RF和极端梯度提升XGBoost）的性能。结果：在237,602例个体中，0.8%（1,800例）首次经历无家可归；在237,141例个体中，0.32%（759例）报告首次警察互动。男性（AOR：H=1.51，P=2.52）、物质使用障碍（AOR：H=3.70，P=2.83）、精神科就诊（AOR：H=1.44，P=1.49）及药物滥用（AOR：H=2.67，P=1.83）与首次无家可归（H）和警察互动（P）显著相关。采用灵活方法的XGBoost表现最优（首次无家可归：敏感性=91%，AUC=90%；首次警察互动：敏感性=90%，AUC=89%）。结论：本研究识别了与首次无家可归和警察互动相关的关键特征，并证明灵活窗口可提升预测建模性能。