In light of the outbreak of COVID-19, analyzing and measuring human mobility has become increasingly important. A wide range of studies have explored spatiotemporal trends over time, examined associations with other variables, evaluated non-pharmacologic interventions (NPIs), and predicted or simulated COVID-19 spread using mobility data. Despite the benefits of publicly available mobility data, a key question remains unanswered: are models using mobility data performing equitably across demographic groups? We hypothesize that bias in the mobility data used to train the predictive models might lead to unfairly less accurate predictions for certain demographic groups. To test our hypothesis, we applied two mobility-based COVID infection prediction models at the county level in the United States using SafeGraph data, and correlated model performance with sociodemographic traits. Findings revealed that there is a systematic bias in models performance toward certain demographic characteristics. Specifically, the models tend to favor large, highly educated, wealthy, young, urban, and non-black-dominated counties. We hypothesize that the mobility data currently used by many predictive models tends to capture less information about older, poorer, non-white, and less educated regions, which in turn negatively impacts the accuracy of the COVID-19 prediction in these regions. Ultimately, this study points to the need of improved data collection and sampling approaches that allow for an accurate representation of the mobility patterns across demographic groups.
翻译:在COVID-19疫情背景下,分析和衡量人类流动性变得日益重要。大量研究探索了时空趋势随时间的变化,考察了与其他变量的关联,评估了非药物干预措施(NPIs),并利用移动性数据预测或模拟了COVID-19的传播。尽管公开可用的移动性数据具有优势,但一个关键问题仍未得到解答:使用移动性数据的模型是否在不同人口群体间表现公平?我们假设,用于训练预测模型的移动性数据中的偏差可能导致某些人口群体的预测准确性不公平地降低。为验证这一假设,我们利用SafeGraph数据在美国县级层面应用了两种基于移动性的COVID感染预测模型,并将模型表现与社会人口学特征相关联。结果发现,模型表现对特定人口学特征存在系统性偏差。具体而言,模型倾向于偏好规模大、受教育程度高、富裕、年轻、城市以及非黑人主导的县。我们假设,当前许多预测模型使用的移动性数据往往对年长、贫困、非白人及受教育程度较低地区的捕捉信息较少,进而对这些地区的COVID-19预测准确性产生负面影响。最终,本研究指出需要改进数据收集和抽样方法,从而能够准确代表不同人口群体的移动性模式。