Deep extreme classification (XC) aims to train an encoder architecture and an accompanying classifier architecture to tag a data point with the most relevant subset of labels from a very large universe of labels. XC applications in ranking, recommendation and tagging routinely encounter tail labels for which the amount of training data is exceedingly small. Graph convolutional networks (GCN) present a convenient but computationally expensive way to leverage task metadata and enhance model accuracies in these settings. This paper formally establishes that in several use cases, the steep computational cost of GCNs is entirely avoidable by replacing GCNs with non-GCN architectures. The paper notices that in these settings, it is much more effective to use graph data to regularize encoder training than to implement a GCN. Based on these insights, an alternative paradigm RAMEN is presented to utilize graph metadata in XC settings that offers significant performance boosts with zero increase in inference computational costs. RAMEN scales to datasets with up to 1M labels and offers prediction accuracy up to 15% higher on benchmark datasets than state of the art methods, including those that use graph metadata to train GCNs. RAMEN also offers 10% higher accuracy over the best baseline on a proprietary recommendation dataset sourced from click logs of a popular search engine. Code for RAMEN will be released publicly.
翻译:深度极端分类旨在训练编码器架构及配套分类器架构,从极大规模标签集合中为数据点标注最相关的子集。在排序、推荐与标注等极端分类应用中,经常遇到训练数据量极小的尾部标签。图卷积网络虽能利用任务元数据提升模型精度,但其计算成本高昂。本文严格证明了在若干应用场景中,完全可以通过非图卷积网络架构替代图卷积网络来规避其高昂计算成本,并指出在这些场景下,利用图数据对编码器训练进行正则化远优于直接部署图卷积网络。基于此发现,本文提出替代范式RAMEN,该范式在极端分类场景中利用图元数据,在零推理计算成本增量的前提下实现显著性能提升。RAMEN可扩展至百万级标签数据集,在基准数据集上的预测精度较包含使用图元数据训练图卷积网络在内的现有最优方法提升最高15%。在来自主流搜索引擎点击日志的专有推荐数据集上,RAMEN相较最优基线方法亦实现10%的精度提升。RAMEN代码将公开发布。