Curriculum learning provides a systematic approach to training. It refines training progressively, tailors training to task requirements, and improves generalization through exposure to diverse examples. We present a curriculum learning approach that builds on existing knowledge about text and graph complexity formalisms for training with text graph data. The core part of our approach is a novel data scheduler, which employs "spaced repetition" and complexity formalisms to guide the training process. We demonstrate the effectiveness of the proposed approach on several text graph tasks and graph neural network architectures. The proposed model gains more and uses less data; consistently prefers text over graph complexity indices throughout training, while the best curricula derived from text and graph complexity indices are equally effective; and it learns transferable curricula across GNN models and datasets. In addition, we find that both node-level (local) and graph-level (global) graph complexity indices, as well as shallow and traditional text complexity indices play a crucial role in effective curriculum learning.
翻译:课程学习提供了一种系统化的训练方法。它能逐步优化训练过程,根据任务需求定制训练内容,并通过接触多样化样本提升泛化能力。我们提出了一种课程学习方法,该方法基于文本与图复杂度形式学的现有知识,用于对文本图数据进行训练。该方法的核心是一个新型数据调度器,它采用"间隔重复"和复杂度形式学来引导训练过程。我们在多个文本图任务和图神经网络架构上验证了所提方法的有效性。该模型能学得更多而使用更少数据;在整个训练过程中始终倾向于选择文本复杂度指标而非图复杂度指标,而基于文本与图复杂度指标推导出的最优课程同样有效;它还能学习到跨GNN模型与数据集的迁移性课程。此外,我们发现节点级(局部)和图级(全局)的图复杂度指标,以及浅层与传统文本复杂度指标在有效课程学习中均发挥着关键作用。