We study the problem of counting $k$-hypergraphlets, an interesting but surprisingly ignored primitive, with the aim of understanding whether efficient algorithms exist. To this end, we consider color coding, a well-known technique for approximately counting $k$-graphlets in graphs. Our first result is that, on hypergraphs, color coding encounters a quadratic barrier: under the Orthogonal Vector Conjecture, no implementation can run in sub-quadratic time in the input size. We then introduce a simple property, $(α,β)$-niceness, that hypergraphs from real-world datasets appear to satisfy for small values of $α$ and $β$. Intuitively, an $(α,β)$-nice hypergraph can be split into two sub-hypergraphs having respectively rank at most $α$ and degree at most $β$. By applying different techniques to each sub-hypergraph and carefully combining the outputs, we show how to run color coding in time $2^{O(k)} \cdot (2^β|V| + α^k |E| + α^2 β\|H\|)$, where $H=(V,E)$ is the input hypergraph. Afterwards, we can sample colorful $k$-hypergraphlets uniformly in expected $k^{O(k)} \cdot (β^2 + \ln |V|)$ time per sample. Experiments on real-world hypergraphs show that our algorithm significantly outperforms the naive quadratic algorithm, sometimes by more than an order of magnitude.
翻译:我们研究$k$-超图小图样计数问题——一个有趣但令人惊讶地被忽视的基本问题,旨在理解是否存在高效算法。为此,我们考虑颜色编码这一在图结构中近似计数$k$-图小图样的著名技术。我们的首个结果表明,在超图上,颜色编码遭遇二次障碍:在正交向量猜想下,任何实现都无法在输入规模的次二次时间内运行。随后,我们引入一个简单性质——$(α,β)$-良性质——真实世界数据集中的超图对于较小的$α$和$β$值似乎满足该性质。直观上,一个$(α,β)$-良性超图可被分解为两个子超图,其秩(rank)分别不超过$α$,度(degree)分别不超过$β$。通过对每个子超图应用不同技术并仔细融合输出,我们展示了如何在$2^{O(k)} \cdot (2^β|V| + α^k |E| + α^2 β\|H\|)$时间内运行颜色编码(其中$H=(V,E)$为输入超图)。随后,我们能在期望$k^{O(k)} \cdot (β^2 + \ln |V|)$时间内均匀采样彩色$k$-超图小图样。在真实世界超图上的实验表明,我们的算法显著优于朴素二次算法,有时甚至提升超过一个数量级。