A hypothesis class admits a sample compression scheme, if for every sample labeled by a hypothesis from the class, it is possible to retain only a small subsample, using which the labels on the entire sample can be inferred. The size of the compression scheme is an upper bound on the size of the subsample produced. Every learnable binary hypothesis class (which must necessarily have finite VC dimension) admits a sample compression scheme of size only a finite function of its VC dimension, independent of the sample size. For multiclass hypothesis classes, the analog of VC dimension is the DS dimension. We show that the analogous statement pertaining to sample compression is not true for multiclass hypothesis classes: every learnable multiclass hypothesis class, which must necessarily have finite DS dimension, does not admit a sample compression scheme of size only a finite function of its DS dimension.
翻译:一个假设类允许一个样本压缩方案,如果对于由该假设类中假设标记的每个样本,可以仅保留一个小型子样本,并使用该子样本推断整个样本上的标签。压缩方案的大小是所生成子样本大小的上界。每个可学习的二元假设类(其必然具有有限的VC维)都允许一个大小仅为VC维有限函数的样本压缩方案,该大小与样本规模无关。对于多类假设类,VC维的类比是DS维。我们表明,关于样本压缩的类似陈述对于多类假设类并不成立:每个可学习的多类假设类(其必然具有有限的DS维)并不允许一个大小仅为DS维有限函数的样本压缩方案。