We study the problem of learning vector-valued linear predictors: these are prediction rules parameterized by a matrix that maps an $m$-dimensional feature vector to a $k$-dimensional target. We focus on the fundamental case with a convex and Lipschitz loss function, and show several new theoretical results that shed light on the complexity of this problem and its connection to related learning models. First, we give a tight characterization of the sample complexity of Empirical Risk Minimization (ERM) in this setting, establishing that $\smash{\widetilde{\Omega}}(k/\epsilon^2)$ examples are necessary for ERM to reach $\epsilon$ excess (population) risk; this provides for an exponential improvement over recent results by Magen and Shamir (2023) in terms of the dependence on the target dimension $k$, and matches a classical upper bound due to Maurer (2016). Second, we present a black-box conversion from general $d$-dimensional Stochastic Convex Optimization (SCO) to vector-valued linear prediction, showing that any SCO problem can be embedded as a prediction problem with $k=\Theta(d)$ outputs. These results portray the setting of vector-valued linear prediction as bridging between two extensively studied yet disparate learning models: linear models (corresponds to $k=1$) and general $d$-dimensional SCO (with $k=\Theta(d)$).
翻译:我们研究学习向量值线性预测器的问题:这类预测规则由一个矩阵参数化,将 $m$ 维特征向量映射到 $k$ 维目标。我们聚焦于具有凸且Lipschitz损失函数的基本情形,并给出若干新的理论结果,以阐明该问题的复杂度及其与相关学习模型的联系。首先,我们对该设定下经验风险最小化(ERM)的样本复杂度给出了紧确刻画,证明ERM达到 $\epsilon$ 超额(总体)风险需要 $\smash{\widetilde{\Omega}}(k/\epsilon^2)$ 个样本;这在目标维度 $k$ 的依赖关系上较Magen与Shamir(2023)的最新结果实现了指数级改进,且与Maurer(2016)的经典上界相匹配。其次,我们提出一种从一般 $d$ 维随机凸优化(SCO)到向量值线性预测的黑盒转换,表明任何SCO问题均可嵌入为具有 $k=\Theta(d)$ 个输出的预测问题。这些结果将向量值线性预测设定描绘为连接两个被广泛研究却迥异的学习模型的桥梁:线性模型(对应 $k=1$)与一般 $d$ 维SCO(对应 $k=\Theta(d)$)。