We combine two methods for the lossless compression of unlabeled graphs - entropy compressing adjacency lists and computing canonical names for vertices - and solve an ensuing novel optimisation problem: Minimum-Entropy Tree-Extraction (MINETREX). MINETREX asks to determine a spanning forest $F$ to remove from a graph $G$ so that the remaining graph $G-F$ has minimal indegree entropy $H(d_1,\ldots,d_n) = \sum_{v\in V} d_v \log_2(m/d_v)$ among all choices for $F$. (Here $d_v$ is the indegree of vertex $v$ in $G-F$; $m$ is the number of edges.) We show that MINETREX is NP-hard to approximate with additive error better than $δn$ (for some constant $δ>0$), and provide a simple greedy algorithm that achieves additive error at most $n / \ln 2$. By storing the extracted spanning forest and the remaining edges separately, we obtain a degree-entropy compressed ("ultrasuccinct") data structure for representing an arbitrary (static) unlabeled graph that supports navigational graph queries in logarithmic time. It serves as a drop-in replacement for adjacency-list representations using substantially less space for most graphs; we precisely quantify these savings in terms of the maximal subgraph density. Our inapproximability result uses an approximate variant of the hitting set problem on biregular instances whose hardness proof is contained implicitly in a reduction by Guruswami and Trevisan (APPROX/RANDOM 2005); we consider the unearthing of this reduction partner of independent interest with further likely uses in hardness of approximation.
翻译:我们结合了两种用于无标号图无损压缩的方法——压缩邻接表的熵和计算顶点的规范名称——并解决了一个随之而来的新颖优化问题:最小熵树提取(MINETREX)。MINETREX要求确定一个生成森林$F$从图$G$中移除,使得剩余图$G-F$在所有可能的$F$选择中具有最小的入度熵$H(d_1,\ldots,d_n) = \sum_{v\in V} d_v \log_2(m/d_v)$。(此处$d_v$是顶点$v$在$G-F$中的入度;$m$是边数。)我们证明,MINETREX是NP难问题,无法以优于$δn$的加法误差进行近似(对于某个常数$δ>0$),并提供了一个简单的贪心算法,其加法误差至多为$n / \ln 2$。通过分别存储提取的生成森林和剩余边,我们获得了一种度熵压缩(“超简洁”)数据结构,用于表示任意(静态)无标号图,并支持对数时间内的导航图查询。它可作为邻接表表示的即插即用替代方案,对大多数图使用显著更少的空间;我们根据最大子图密度精确量化了这些节省。我们的不可近似性结果使用了双正则实例上命中集问题的一个近似变体,其硬度证明隐含在Guruswami和Trevisan(APPROX/RANDOM 2005)的归约中;我们认为揭示这一归约伙伴具有独立意义,并可能在近似硬度中有进一步的应用。