Neural networks have revolutionized language modeling and excelled in various downstream tasks. However, the extent to which these models achieve compositional generalization comparable to human cognitive abilities remains a topic of debate. While existing approaches in the field have mainly focused on novel architectures and alternative learning paradigms, we introduce a pioneering method harnessing the power of dataset cartography (Swayamdipta et al., 2020). By strategically identifying a subset of compositional generalization data using this approach, we achieve a remarkable improvement in model accuracy, yielding enhancements of up to 10% on CFQ and COGS datasets. Notably, our technique incorporates dataset cartography as a curriculum learning criterion, eliminating the need for hyperparameter tuning while consistently achieving superior performance. Our findings highlight the untapped potential of dataset cartography in unleashing the full capabilities of compositional generalization within Transformer models. Our code is available at https://github.com/cyberiada/cartography-for-compositionality.
翻译:神经网络彻底革新了语言建模并在各类下游任务中表现出色。然而,这些模型在实现与人类认知能力相媲美的组合泛化方面仍存在争议。尽管现有研究主要集中在新型架构和替代学习范式上,我们提出了一种开创性方法,巧妙运用数据集制图技术(Swayamdipta等,2020)。通过该方法策略性地识别组合泛化数据的子集,我们在CFQ和COGS数据集上实现了模型准确率的显著提升,最高可达10%。值得注意的是,我们的技术将数据集制图作为课程学习准则,无需超参数调优即可持续获得优越性能。本研究的发现揭示了数据集制图在释放Transformer模型组合泛化全部潜力方面的未开发价值。代码已开源至https://github.com/cyberiada/cartography-for-compositionality。