The VC-dimension is a fundamental and well-studied measure of the complexity of a set system (or hypergraph) that is central to many areas of machine learning. We establish several new results on the complexity of computing the VC-dimension. In particular, given a hypergraph $\mathcal{H}=(\mathcal{V},\mathcal{E})$, we prove that the naive $2^{\mathcal{O}(|\mathcal{V}|)}$-time algorithm is asymptotically tight under the Exponential Time Hypothesis (ETH). We then prove that the problem admits a 1-additive fixed-parameter approximation algorithm when parameterized by the maximum degree of $\mathcal{H}$ and a fixed-parameter algorithm when parameterized by its dimension, and that these are essentially the only such exploitable structural parameters. Lastly, we consider a generalization of the problem, formulated using graphs, which captures the VC-dimension of both set systems and graphs. We show that it is fixed-parameter tractable parameterized by the treewidth of the graph (which, in the case of set systems, applies to the treewidth of its incidence graph). In contrast with closely related problems whose dependency on the treewidth is necessarily double-exponential (assuming the ETH), our algorithm has a relatively low dependency on the treewidth.
翻译:VC维度是衡量集合系统(或超图)复杂性的一个基本且被广泛研究的度量,对机器学习诸多领域至关重要。本文针对计算VC维度的复杂度建立了若干新结果。特别地,给定超图$\mathcal{H}=(\mathcal{V},\mathcal{E})$,我们证明在指数时间假设(ETH)下,朴素的$2^{\mathcal{O}(|\mathcal{V}|)}$时间算法是渐近紧的。随后,我们证明该问题在参数化为$\mathcal{H}$的最大度时存在1-加法固定参数近似算法,在参数化为其维度时存在固定参数算法,并且这些本质上是仅有的可被利用的结构参数。最后,我们考虑该问题的图论形式推广,该形式同时捕捉了集合系统与图的VC维度。我们证明该问题在参数化为图的树宽时是固定参数可解的(对于集合系统情形,此参数对应于其关联图的树宽)。与那些对树宽的依赖必然为双指数级的密切相关问题(在ETH假设下)相比,我们的算法对树宽的依赖相对较低。