A Fair and Memory/Time-efficient Hashmap

There is a large amount of work constructing hashmaps to minimize the number of collisions. However, to the best of our knowledge no known hashing technique guarantees group fairness among different groups of items. We are given a set $P$ of $n$ tuples in $\mathbb{R}^d$, for a constant dimension $d$ and a set of groups $\mathcal{G}=\{\mathbf{g}_1,\ldots, \mathbf{g}_k\}$ such that every tuple belongs to a unique group. We formally define the fair hashing problem introducing the notions of single fairness ($Pr[h(p)=h(x)\mid p\in \mathbf{g}_i, x\in P]$ for every $i=1,\ldots, k$), pairwise fairness ($Pr[h(p)=h(q)\mid p,q\in \mathbf{g}_i]$ for every $i=1,\ldots, k$), and the well-known collision probability ($Pr[h(p)=h(q)\mid p,q\in P]$). The goal is to construct a hashmap such that the collision probability, the single fairness, and the pairwise fairness are close to $1/m$, where $m$ is the number of buckets in the hashmap. We propose two families of algorithms to design fair hashmaps. First, we focus on hashmaps with optimum memory consumption minimizing the unfairness. We model the input tuples as points in $\mathbb{R}^d$ and the goal is to find the vector $w$ such that the projection of $P$ onto $w$ creates an ordering that is convenient to split to create a fair hashmap. For each projection we design efficient algorithms that find near optimum partitions of exactly (or at most) $m$ buckets. Second, we focus on hashmaps with optimum fairness ($0$-unfairness), minimizing the memory consumption. We make the important observation that the fair hashmap problem is reduced to the necklace splitting problem. By carefully implementing algorithms for solving the necklace splitting problem, we propose faster algorithms constructing hashmaps with $0$-unfairness using $2(m-1)$ boundary points when $k=2$ and $k(m-1)(4+\log_2 (3mn))$ boundary points for $k>2$.

翻译：现有大量研究致力于构建哈希映射以最小化冲突次数。然而，据我们所知，尚无哈希技术能够保证不同物品组之间的群组公平性。给定一组$P$包含$\mathbb{R}^d$中的$n$个元组（其中$d$为常数维度），以及一组分组$\mathcal{G}=\{\mathbf{g}_1,\ldots, \mathbf{g}_k\}$使得每个元组属于唯一分组。我们形式化定义了公平哈希问题，引入了单一公平性（对于每个$i=1,\ldots, k$，$Pr[h(p)=h(x)\mid p\in \mathbf{g}_i, x\in P]$）、成对公平性（对于每个$i=1,\ldots, k$，$Pr[h(p)=h(q)\mid p,q\in \mathbf{g}_i]$）以及众所周知的冲突概率（$Pr[h(p)=h(q)\mid p,q\in P]$）等概念。目标是构建一个哈希映射，使得冲突概率、单一公平性和成对公平性均接近$1/m$，其中$m$为哈希映射中的桶数量。我们提出两类算法族来设计公平哈希映射。第一类聚焦于在最优内存消耗下最小化不公平性的哈希映射。我们将输入元组建模为$\mathbb{R}^d$中的点，目标是找到向量$w$，使得$P$在$w$上的投影产生一个便于分割以创建公平哈希映射的排序。针对每个投影，我们设计高效算法来寻找精确（或最多）$m$个桶的近似最优分割。第二类聚焦于具有最优公平性（零不公平性）的哈希映射，旨在最小化内存消耗。我们提出重要观察：公平哈希映射问题可简化为项链分割问题。通过精心实现求解项链分割问题的算法，我们提出了更快速的算法，当$k=2$时使用$2(m-1)$个边界点构建零不公平性哈希映射，当$k>2$时使用$k(m-1)(4+\log_2 (3mn))$个边界点。