Multi-distribution learning is a natural generalization of PAC learning to settings with multiple data distributions. There remains a significant gap between the known upper and lower bounds for PAC-learnable classes. In particular, though we understand the sample complexity of learning a VC dimension d class on $k$ distributions to be $O(\epsilon^{-2} \ln(k)(d + k) + \min\{\epsilon^{-1} dk, \epsilon^{-4} \ln(k) d\})$, the best lower bound is $\Omega(\epsilon^{-2}(d + k \ln(k)))$. We discuss recent progress on this problem and some hurdles that are fundamental to the use of game dynamics in statistical learning.
翻译:多分布学习是PAC学习在多个数据分布设定下的自然推广。目前,对于PAC可学习类,已知的上下界之间仍存在显著差距。具体而言,尽管我们已知在$k$个分布上学习VC维为$d$的类别的样本复杂度为$O(\epsilon^{-2} \ln(k)(d + k) + \min\{\epsilon^{-1} dk, \epsilon^{-4} \ln(k) d\})$,但最优下界仅为$\Omega(\epsilon^{-2}(d + k \ln(k)))$。我们讨论了该问题的最新进展,以及将博弈动力学应用于统计学习时所面临的一些基本障碍。