A fairness assessment of mobility-based COVID-19 case prediction models

In light of the outbreak of COVID-19, analyzing and measuring human mobility has become increasingly important. A wide range of studies have explored spatiotemporal trends over time, examined associations with other variables, evaluated non-pharmacologic interventions (NPIs), and predicted or simulated COVID-19 spread using mobility data. Despite the benefits of publicly available mobility data, a key question remains unanswered: are models using mobility data performing equitably across demographic groups? We hypothesize that bias in the mobility data used to train the predictive models might lead to unfairly less accurate predictions for certain demographic groups. To test our hypothesis, we applied two mobility-based COVID infection prediction models at the county level in the United States using SafeGraph data, and correlated model performance with sociodemographic traits. Findings revealed that there is a systematic bias in models performance toward certain demographic characteristics. Specifically, the models tend to favor large, highly educated, wealthy, young, urban, and non-black-dominated counties. We hypothesize that the mobility data currently used by many predictive models tends to capture less information about older, poorer, non-white, and less educated regions, which in turn negatively impacts the accuracy of the COVID-19 prediction in these regions. Ultimately, this study points to the need of improved data collection and sampling approaches that allow for an accurate representation of the mobility patterns across demographic groups.

翻译：在COVID-19疫情背景下，分析和衡量人类流动性变得日益重要。大量研究探索了时空趋势随时间的变化，考察了与其他变量的关联，评估了非药物干预措施（NPIs），并利用移动性数据预测或模拟了COVID-19的传播。尽管公开可用的移动性数据具有优势，但一个关键问题仍未得到解答：使用移动性数据的模型是否在不同人口群体间表现公平？我们假设，用于训练预测模型的移动性数据中的偏差可能导致某些人口群体的预测准确性不公平地降低。为验证这一假设，我们利用SafeGraph数据在美国县级层面应用了两种基于移动性的COVID感染预测模型，并将模型表现与社会人口学特征相关联。结果发现，模型表现对特定人口学特征存在系统性偏差。具体而言，模型倾向于偏好规模大、受教育程度高、富裕、年轻、城市以及非黑人主导的县。我们假设，当前许多预测模型使用的移动性数据往往对年长、贫困、非白人及受教育程度较低地区的捕捉信息较少，进而对这些地区的COVID-19预测准确性产生负面影响。最终，本研究指出需要改进数据收集和抽样方法，从而能够准确代表不同人口群体的移动性模式。

相关内容

MoDELS

关注 46

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

【深度学习表格检测、信息提取和结构化】《Table Detection, Information Extraction and Structuring using Deep Learning》by Vihar Kurama

专知会员服务

38+阅读 · 2020年1月23日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

15+阅读 · 2019年10月23日