A Bayesian Spatial Model to Correct Under-Reporting in Urban Crowdsourcing

Decision-makers often observe the occurrence of events through a reporting process. City governments, for example, rely on resident reports to find and then resolve urban infrastructural problems such as fallen street trees, flooded basements, or rat infestations. Without additional assumptions, there is no way to distinguish events that occur but are not reported from events that truly did not occur--a fundamental problem in settings with positive-unlabeled data. Because disparities in reporting rates correlate with resident demographics, addressing incidents only on the basis of reports leads to systematic neglect in neighborhoods that are less likely to report events. We show how to overcome this challenge by leveraging the fact that events are spatially correlated. Our framework uses a Bayesian spatial latent variable model to infer event occurrence probabilities and applies it to storm-induced flooding reports in New York City, further pooling results across multiple storms. We show that a model accounting for under-reporting and spatial correlation predicts future reports more accurately than other models, and further induces a more equitable set of inspections: its allocations better reflect the population and provide equitable service to non-white, less traditionally educated, and lower-income residents. This finding reflects heterogeneous reporting behavior learned by the model: reporting rates are higher in Census tracts with higher populations, proportions of white residents, and proportions of owner-occupied households. Our work lays the groundwork for more equitable proactive government services, even with disparate reporting behavior.

翻译：决策者通常通过报告过程观察事件的发生。例如，市政府依赖居民报告来发现并解决城市基础设施问题，如倒伏的行道树、地下室积水或鼠患。若无额外假设，无法区分实际发生但未被报告的事件与根本未发生的事件——这是正无标签数据情境下的一个根本性问题。由于报告率的差异与居民人口统计特征相关，仅基于报告处理事件会导致那些报告可能性较低的社区被系统性忽视。我们展示了如何通过利用事件空间相关性这一特征来克服这一挑战。我们的框架采用贝叶斯空间潜变量模型推断事件发生概率，并将其应用于纽约市暴雨引发的洪水报告，进一步整合了多场风暴的观测结果。研究表明，考虑漏报和空间相关性的模型在预测未来报告方面优于其他模型，并能生成更公平的检查方案：其分配方案更好地反映了人口分布，并向非白人、低传统教育水平及低收入居民提供公平服务。这一发现揭示了模型学习到的异质性报告行为：在人口更多、白人居民比例更高、自住房占比更高的普查区，报告率更高。我们的工作为提供更公平的主动式政府服务奠定了基础，即使面对差异化的报告行为。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

O’Reilly报告：知识图谱崛起——面向现代数据集成和数据结构体系，“The Rise of the Knowledge Graph——Toward Modern Data Integration and the Data Fabric Architecture”

专知会员服务

49+阅读 · 2022年2月18日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日