We study the problem of enforcing continuous group fairness over windows in data streams. We propose a novel fairness model that ensures group fairness at a finer granularity level (referred to as block) within each sliding window. This formulation is particularly useful when the window size is large, making it desirable to enforce fairness at a finer granularity. Within this framework, we address two key challenges: efficiently monitoring whether each sliding window satisfies block-level group fairness, and reordering the current window as effectively as possible when fairness is violated. To enable real-time monitoring, we design sketch-based data structures that maintain attribute distributions with minimal overhead. We also develop optimal, efficient algorithms for the reordering task, supported by rigorous theoretical guarantees. Our evaluation on four real-world streaming scenarios demonstrates the practical effectiveness of our approach. We achieve millisecond-level processing and a throughput of approximately 30,000 queries per second on average, depending on system parameters. The stream reordering algorithm improves block-level group fairness by up to 95% in certain cases, and by 50-60% on average across datasets. A qualitative study further highlights the advantages of block-level fairness compared to window-level fairness.
翻译:本文研究了在数据流滑动窗口中强制执行连续群体公平性的问题。我们提出了一种新颖的公平性模型,该模型确保在每个滑动窗口内以更细的粒度(称为块)实现群体公平。当窗口尺寸较大时,这种表述尤其有用,因为此时需要在更细的粒度上强制执行公平性。在此框架下,我们解决了两个关键挑战:高效监控每个滑动窗口是否满足块级群体公平性,以及在公平性被违反时尽可能有效地对当前窗口进行重排序。为了实现实时监控,我们设计了基于草图的数据结构,以最小开销维护属性分布。我们还为重排序任务开发了最优、高效的算法,并辅以严格的理论保证。我们在四个真实世界流式场景上的评估证明了我们方法的实际有效性。根据系统参数的不同,我们实现了毫秒级的处理速度,平均每秒吞吐量约为30,000次查询。流式重排序算法在某些情况下将块级群体公平性提升了高达95%,在跨数据集上的平均提升幅度为50-60%。一项定性研究进一步凸显了块级公平性相较于窗口级公平性的优势。