Leveraging the kernel trick in both the input and output spaces, surrogate kernel methods are a flexible and theoretically grounded solution to structured output prediction. If they provide state-of-the-art performance on complex data sets of moderate size (e.g., in chemoinformatics), these approaches however fail to scale. We propose to equip surrogate kernel methods with sketching-based approximations, applied to both the input and output feature maps. We prove excess risk bounds on the original structured prediction problem, showing how to attain close-to-optimal rates with a reduced sketch size that depends on the eigendecay of the input/output covariance operators. From a computational perspective, we show that the two approximations have distinct but complementary impacts: sketching the input kernel mostly reduces training time, while sketching the output kernel decreases the inference time. Empirically, our approach is shown to scale, achieving state-of-the-art performance on benchmark data sets where non-sketched methods are intractable.
翻译:利用输入和输出空间中的核技巧,代理核方法为结构化输出预测提供了灵活且理论严谨的解决方案。尽管这些方法在中等规模复杂数据集(如化学信息学领域)上表现优异,但其可扩展性不足。我们提出采用基于草图的近似方法增强代理核方法,将其应用于输入和输出特征映射。我们证明了原始结构化预测问题的超额风险界,说明如何通过依赖于输入/输出协方差算子特征值衰减的缩减草图大小实现接近最优的收敛速率。从计算角度而言,这两种近似具有不同但互补的影响:草图化输入核主要缩短训练时间,而草图化输出核则降低推理延迟。实验表明,我们的方法具有可扩展性,在非草图方法难以处理的基准数据集上实现了最先进的性能。