One of the open problems in machine learning is whether any set-family of VC-dimension $d$ admits a sample compression scheme of size $O(d)$. In this paper, we study this problem for balls in graphs. For a ball $B=B_r(x)$ of a graph $G=(V,E)$, a realizable sample for $B$ is a signed subset $X=(X^+,X^-)$ of $V$ such that $B$ contains $X^+$ and is disjoint from $X^-$. A proper sample compression scheme of size $k$ consists of a compressor and a reconstructor. The compressor maps any realizable sample $X$ to a subsample $X'$ of size at most $k$. The reconstructor maps each such subsample $X'$ to a ball $B'$ of $G$ such that $B'$ includes $X^+$ and is disjoint from $X^-$. For balls of arbitrary radius $r$, we design proper labeled sample compression schemes of size $2$ for trees, of size $3$ for cycles, of size $4$ for interval graphs, of size $6$ for trees of cycles, and of size $22$ for cube-free median graphs. For balls of a given radius, we design proper labeled sample compression schemes of size $2$ for trees and of size $4$ for interval graphs. We also design approximate sample compression schemes of size 2 for balls of $\delta$-hyperbolic graphs.
翻译:机器学习中的未解决问题之一是:任何VC维为$d$的集合族是否允许大小为$O(d)$的样本压缩方案。本文针对图中的球研究该问题。对于图$G=(V,E)$中的球$B=B_r(x)$,$B$的可实现样本是$V$的有符号子集$X=(X^+,X^-)$,使得$B$包含$X^+$且与$X^-$不相交。大小为$k$的适定样本压缩方案由压缩器和重构器组成。压缩器将任意可实现样本$X$映射到大小不超过$k$的子样本$X'$。重构器将每个这样的子样本$X'$映射到$G$中的球$B'$,使得$B'$包含$X^+$且与$X^-$不相交。对于任意半径$r$的球,我们为树设计了大小为$2$的适定标记样本压缩方案,为环设计了大小为$3$的方案,为区间图设计了大小为$4$的方案,为环树设计了大小为$6$的方案,为无立方体中位图设计了大小为$22$的方案。对于给定半径的球,我们为树设计了大小为$2$的适定标记样本压缩方案,为区间图设计了大小为$4$的方案。此外,我们为$\delta$-双曲图的球设计了大小为$2$的近似样本压缩方案。