Online Realizable Regression and Applications for ReLU Networks

Realizable online regression can behave very differently from online classification. Even without any margin or stochastic assumptions, realizability may enforce horizon-free (finite) cumulative loss under metric-like losses, even when the analogous classification problem has an infinite mistake bound. We study realizable online regression in the adversarial model under losses that satisfy an approximate triangle inequality (approximate pseudo-metrics). Recent work of Attias et al. shows that the minimax realizable cumulative loss is characterized by the scaled Littlestone/online dimension $\mathbb{D}_{\mathrm{onl}}$, but this quantity can be difficult to analyze. Our main technical contribution is a generic potential method that upper bounds $\mathbb{D}_{\mathrm{onl}}$ by a concrete Dudley-type entropy integral that depends only on covering numbers of the hypothesis class under the induced sup pseudo-metric. We define an \emph{entropy potential} $Φ(\mathcal{H})=\int_{0}^{diam(\mathcal{H})} \log N(\mathcal{H},\varepsilon)\,d\varepsilon$, where $N(\mathcal{H},\varepsilon)$ is the $\varepsilon$-covering number of $\mathcal{H}$, and show that for every $c$-approximate pseudo-metric loss, $\mathbb{D}_{\mathrm{onl}}(\mathcal{H})\le O(c)\,Φ(\mathcal{H})$. In particular, polynomial metric entropy implies $Φ(\mathcal{H})<\infty$ and hence a horizon-free realizable cumulative-loss bound with transparent dependence on effective dimension. We illustrate the method on two families. We prove a sharp $q$-vs.-$d$ dichotomy for realizable online learning (finite and efficiently achievable $Θ_{d,q}(L^d)$ total loss for $L$-Lipschitz regression iff $q>d$, otherwise infinite), and for bounded-norm $k$-ReLU networks separate regression (finite loss, even $\widetilde O(k^2)$, and $O(1)$ for one ReLU) from classification (impossible already for $k=2,d=1$).

翻译：可实现在线回归的行为可能与在线分类截然不同。即使在无间隔或随机假设的情况下，在满足度量型损失时，可实现性可能保证无水平（有限）累积损失，尽管类似分类问题具有无限错误界限。我们在对抗模型下研究满足近似三角不等式（近似伪度量）的损失下的可实现在线回归。Attias等人近期工作表明，最小最大可实现累积损失由缩放后的Littlestone/在线维度$\mathbb{D}_{\mathrm{onl}}$刻画，但该量难以分析。我们主要技术贡献在于提出一种通用势方法，通过依赖于假设类在诱导上确界伪度量下覆盖数的具体Dudley型熵积分，给出$\mathbb{D}_{\mathrm{onl}}$的上界。我们定义熵势$\Phi(\mathcal{H})=\int_{0}^{diam(\mathcal{H})} \log N(\mathcal{H},\varepsilon)\,d\varepsilon$，其中$N(\mathcal{H},\varepsilon)$是$\mathcal{H}$的$\varepsilon$-覆盖数，并证明对每个$c$-近似伪度量损失，有$\mathbb{D}_{\mathrm{onl}}(\mathcal{H})\le O(c)\,\Phi(\mathcal{H})$。特别地，多项式度量熵意味着$\Phi(\mathcal{H})<\infty$，从而得到具有有效维度透明依赖性的无水平可实现累积损失界。我们通过两个家族阐释该方法。我们证明了可实现在线学习的尖锐$q$与$d$二分性（对于$L$-Lipschitz回归，当且仅当$q>d$时，可实现有限且高效可达的$\Theta_{d,q}(L^d)$总损失，否则损失无限），以及有界范数$k$-ReLU网络的分离回归（有限损失，甚至$\widetilde O(k^2)$，对单个ReLU为$O(1)$）与分类（对$k=2,d=1$已不可能）的差异。