A hypothesis class admits a sample compression scheme, if for every sample labeled by a hypothesis from the class, it is possible to retain only a small subsample, using which the labels on the entire sample can be inferred. The size of the compression scheme is an upper bound on the size of the subsample produced. Every learnable binary hypothesis class (which must necessarily have finite VC dimension) admits a sample compression scheme of size only a finite function of its VC dimension, independent of the sample size. For multiclass hypothesis classes, the analog of VC dimension is the DS dimension. We show that the analogous statement pertaining to sample compression is not true for multiclass hypothesis classes: every learnable multiclass hypothesis class, which must necessarily have finite DS dimension, does not admit a sample compression scheme of size only a finite function of its DS dimension.
翻译:一个假设类允许样本压缩方案,如果对于由该类中假设标注的每个样本,可以仅保留一个小型子样本,并据此推断整个样本的标签。压缩方案的大小是所产生子样本大小的上界。每个可学习的二元假设类(必然具有有限的VC维数)都允许一个大小仅为其VC维数的有限函数(与样本大小无关)的样本压缩方案。对于多类假设类,VC维数的对应概念是DS维数。我们证明,关于样本压缩的类似结论对于多类假设类不成立:每个可学习的多类假设类(必然具有有限的DS维数)并不允许一个大小仅为其DS维数的有限函数的样本压缩方案。