This study investigates the interplay among social demographics, built environment characteristics, and environmental hazard exposure features in determining community level cancer prevalence. Utilizing data from five Metropolitan Statistical Areas in the United States: Chicago, Dallas, Houston, Los Angeles, and New York, the study implemented an XGBoost machine learning model to predict the extent of cancer prevalence and evaluate the importance of different features. Our model demonstrates reliable performance, with results indicating that age, minority status, and population density are among the most influential factors in cancer prevalence. We further explore urban development and design strategies that could mitigate cancer prevalence, focusing on green space, developed areas, and total emissions. Through a series of experimental evaluations based on causal inference, the results show that increasing green space and reducing developed areas and total emissions could alleviate cancer prevalence. The study and findings contribute to a better understanding of the interplay among urban features and community health and also show the value of interpretable machine learning models for integrated urban design to promote public health. The findings also provide actionable insights for urban planning and design, emphasizing the need for a multifaceted approach to addressing urban health disparities through integrated urban design strategies.
翻译:本研究探究社会人口学特征、建成环境特征与环境危害暴露特征在决定社区级癌症患病率中的相互作用。基于美国五个大都市统计区(芝加哥、达拉斯、休斯顿、洛杉矶和纽约)的数据,本研究采用XGBoost机器学习模型预测癌症患病程度并评估不同特征的重要性。我们的模型展现出可靠的性能,结果表明年龄、少数族裔状况和人口密度是影响癌症患病率的最关键因素。我们进一步探讨可能降低癌症患病率的城市发展与设计策略,重点关注绿地空间、开发区域和总排放量。通过一系列基于因果推断的实验评估,结果表明增加绿地空间、减少开发区域和总排放量可缓解癌症患病率。本研究及其发现有助于深化对城市特征与社区健康之间相互作用的理解,同时也展示了可解释机器学习模型在促进公共健康的综合城市设计中的价值。研究结果为城市规划与设计提供了可操作的见解,强调需要通过综合城市设计策略采取多维度方法解决城市健康差异问题。