Multi-Attribute Group Fairness in $k$-NN Queries on Vector Databases

We initiate the study of multi-attribute group fairness in $k$-nearest neighbor ($k$-NN) search over vector databases. Unlike prior work that optimizes efficiency or query filtering, fairness imposes count constraints to ensure proportional representation across groups defined by protected attributes. When fairness spans multiple attributes, these constraints must be satisfied simultaneously, making the problem computationally hard. To address this, we propose a computational framework that produces high-quality approximate nearest neighbors with good trade-offs between search time, memory/indexing cost, and recall. We adapt locality-sensitive hashing (LSH) to accelerate candidate generation and build a lightweight index over the Cartesian product of protected attribute values. Our framework retrieves candidates satisfying joint count constraints and then applies a post-processing stage to construct fair $k$-NN results across all attributes. For 2 attributes, we present an exact polynomial-time flow-based algorithm; for 3 or more, we formulate ILP-based exact solutions with higher computational cost. We provide theoretical guarantees, identify efficiency--fairness trade-offs, and empirically show that existing vector search methods cannot be directly adapted for fairness. Experimental evaluations demonstrate the generality of the proposed framework and scalability.

翻译：本文首次研究了向量数据库中$k$-近邻($k$-NN)搜索的多属性群体公平性问题。与以往专注于优化效率或查询过滤的研究不同，公平性要求通过数量约束来确保由受保护属性定义的各群体获得比例性表征。当公平性涉及多个属性时，这些约束必须同时满足，使得该问题在计算上具有挑战性。为此，我们提出了一个计算框架，能在搜索时间、内存/索引开销与召回率之间取得良好权衡，从而生成高质量的近似最近邻结果。我们采用局部敏感哈希(LSH)来加速候选生成，并在受保护属性值的笛卡尔积上构建轻量级索引。该框架首先检索满足联合数量约束的候选对象，随后通过后处理阶段构建跨所有属性的公平$k$-NN结果。针对双属性情形，我们提出了一种基于流算法的精确多项式时间解法；对于三个及以上属性，我们构建了基于整数线性规划(ILP)的精确解法，但其计算成本更高。我们提供了理论保证，明确了效率与公平性的权衡关系，并通过实验证明现有向量搜索方法无法直接适配公平性需求。实验评估验证了所提框架的通用性与可扩展性。

相关内容

属性

关注 2

一个具体事物，总是有许许多多的性质与关系，我们把一个事物的性质与关系，都叫作事物的属性。事物与属性是不可分的，事物都是有属性的事物，属性也都是事物的属性。一个事物与另一个事物的相同或相异，也就是一个事物的属性与另一事物的属性的相同或相异。由于事物属性的相同或相异，客观世界中就形成了许多不同的事物类。具有相同属性的事物就形成一类，具有不同属性的事物就分别地形成不同的类。

论学习、公平性与复杂度

专知会员服务

11+阅读 · 2月28日

《多机器人规划中基于约束的搜索研究》156页

专知会员服务

25+阅读 · 2月3日

【KDD 2021】算法公平性解释框架FACTS

专知会员服务

24+阅读 · 2021年8月27日

【KDD2020】具有条件公平性的算法决策，Algorithmic Decision Making with Conditional Fairness

专知会员服务

22+阅读 · 2020年6月19日