We initiate the study of multi-attribute group fairness in $k$-nearest neighbor ($k$-NN) search over vector databases. Unlike prior work that optimizes efficiency or query filtering, fairness imposes count constraints to ensure proportional representation across groups defined by protected attributes. When fairness spans multiple attributes, these constraints must be satisfied simultaneously, making the problem computationally hard. To address this, we propose a computational framework that produces high-quality approximate nearest neighbors with good trade-offs between search time, memory/indexing cost, and recall. We adapt locality-sensitive hashing (LSH) to accelerate candidate generation and build a lightweight index over the Cartesian product of protected attribute values. Our framework retrieves candidates satisfying joint count constraints and then applies a post-processing stage to construct fair $k$-NN results across all attributes. For 2 attributes, we present an exact polynomial-time flow-based algorithm; for 3 or more, we formulate ILP-based exact solutions with higher computational cost. We provide theoretical guarantees, identify efficiency--fairness trade-offs, and empirically show that existing vector search methods cannot be directly adapted for fairness. Experimental evaluations demonstrate the generality of the proposed framework and scalability.
翻译:本文首次研究了向量数据库中$k$-近邻($k$-NN)搜索的多属性群体公平性问题。与以往专注于优化效率或查询过滤的研究不同,公平性要求通过数量约束来确保由受保护属性定义的各群体获得比例性表征。当公平性涉及多个属性时,这些约束必须同时满足,使得该问题在计算上具有挑战性。为此,我们提出了一个计算框架,能在搜索时间、内存/索引开销与召回率之间取得良好权衡,从而生成高质量的近似最近邻结果。我们采用局部敏感哈希(LSH)来加速候选生成,并在受保护属性值的笛卡尔积上构建轻量级索引。该框架首先检索满足联合数量约束的候选对象,随后通过后处理阶段构建跨所有属性的公平$k$-NN结果。针对双属性情形,我们提出了一种基于流算法的精确多项式时间解法;对于三个及以上属性,我们构建了基于整数线性规划(ILP)的精确解法,但其计算成本更高。我们提供了理论保证,明确了效率与公平性的权衡关系,并通过实验证明现有向量搜索方法无法直接适配公平性需求。实验评估验证了所提框架的通用性与可扩展性。