Proposed for rapid document similarity estimation in web search engines, the celebrated property of minwise independence imposes highly symmetric constraints on a family $\mathcal{F}$ of permutations of $\{1,\ldots, n\}$: The property is fulfilled by $\mathcal{F}$ if for each $j\in \{1,\ldots,n\}$, any cardinality-$j$ subset $X\subseteq \{1,\ldots,n\}$, and any fixed element $x^\ast\in X$, it occurs with probability $1/j$ that a randomly drawn permutation $\pi$ from $\mathcal{F}$ satisfies $\pi(x^\ast)=\min \{\pi(x) : x\in X\}$. The central interest is to find a family with fewest possible members meeting the stated constraints. We provide a framework that, firstly, is realized as a pure SAT model and, secondly, generalizes a heuristic of Mathon and van Trung to the search of these families. Originally, the latter enforces an underlying group-theoretic decomposition to achieve a significant speed-up for the computer-aided search of structures which can be identified with so-called rankwise independent families. We observe that this approach is suitable to find provenly optimal new representatives of minwise independent families while yielding a decisive speed-up, too. As the problem has a naive search space of size at least $(n!)^n$, we also carefully address symmetry breaking. Finally, we add a bijective proof for a problem encountered by Bargachev when deriving a lower bound on the number of members in a minimal rankwise independent family.
翻译:为网络搜索引擎中快速文档相似度估计而提出的著名最小哈希独立性性质,对集合$\{1,\ldots, n\}$的置换族$\mathcal{F}$施加了高度对称的约束:若对于每个$j\in \{1,\ldots,n\}$、任意基数$j$的子集$X\subseteq \{1,\ldots,n\}$及任意固定元素$x^\ast\in X$,从$\mathcal{F}$中随机抽取的置换$\pi$满足$\pi(x^\ast)=\min \{\pi(x) : x\in X\}$的概率均为$1/j$,则称$\mathcal{F}$满足该性质。核心研究目标是寻找满足所述约束且成员数最少的置换族。我们提出了一个框架,该框架首先被实现为纯SAT模型,其次将Mathon和van Trung的启发式方法推广至此类置换族的搜索。原启发式方法通过强制实施底层群论分解,显著加速了可与所谓秩独立族等同的结构的计算机辅助搜索。我们发现该方法适用于找到可证明最优的最小哈希独立族新代表,同时也能实现决定性加速。由于该问题的原始搜索空间规模至少为$(n!)^n$,我们还细致处理了对称性破缺问题。最后,针对Bargachev在推导最小秩独立族成员数下界时遇到的问题,我们补充了一个双射证明。