Decision-makers often observe the occurrence of events through a reporting process. City governments, for example, rely on resident reports to find and then resolve urban infrastructural problems such as fallen street trees, flooded basements, or rat infestations. Without additional assumptions, there is no way to distinguish events that occur but are not reported from events that truly did not occur--a fundamental problem in settings with positive-unlabeled data. Because disparities in reporting rates correlate with resident demographics, addressing incidents only on the basis of reports leads to systematic neglect in neighborhoods that are less likely to report events. We show how to overcome this challenge by leveraging the fact that events are spatially correlated. Our framework uses a Bayesian spatial latent variable model to infer event occurrence probabilities and applies it to storm-induced flooding reports in New York City, further pooling results across multiple storms. We show that a model accounting for under-reporting and spatial correlation predicts future reports more accurately than other models, and further induces a more equitable set of inspections: its allocations better reflect the population and provide equitable service to non-white, less traditionally educated, and lower-income residents. This finding reflects heterogeneous reporting behavior learned by the model: reporting rates are higher in Census tracts with higher populations, proportions of white residents, and proportions of owner-occupied households. Our work lays the groundwork for more equitable proactive government services, even with disparate reporting behavior.
翻译:决策者通常通过报告过程观察事件的发生。例如,市政府依赖居民报告来发现并解决城市基础设施问题,如倒伏的行道树、地下室积水或鼠患。若无额外假设,无法区分实际发生但未被报告的事件与根本未发生的事件——这是正无标签数据情境下的一个根本性问题。由于报告率的差异与居民人口统计特征相关,仅基于报告处理事件会导致那些报告可能性较低的社区被系统性忽视。我们展示了如何通过利用事件空间相关性这一特征来克服这一挑战。我们的框架采用贝叶斯空间潜变量模型推断事件发生概率,并将其应用于纽约市暴雨引发的洪水报告,进一步整合了多场风暴的观测结果。研究表明,考虑漏报和空间相关性的模型在预测未来报告方面优于其他模型,并能生成更公平的检查方案:其分配方案更好地反映了人口分布,并向非白人、低传统教育水平及低收入居民提供公平服务。这一发现揭示了模型学习到的异质性报告行为:在人口更多、白人居民比例更高、自住房占比更高的普查区,报告率更高。我们的工作为提供更公平的主动式政府服务奠定了基础,即使面对差异化的报告行为。