Test data is said to be out-of-distribution (OOD) when it unexpectedly differs from the training data, a common challenge in real-world use cases of machine learning. Although OOD generalisation has gained interest in recent years, few works have focused on OOD generalisation in spoken language understanding (SLU) tasks. To facilitate research on this topic, we introduce a modified version of the popular SLU dataset SLURP, featuring data splits for testing OOD generalisation in the SLU task. We call our modified dataset SLURP For OOD generalisation, or SLURPFOOD. Utilising our OOD data splits, we find end-to-end SLU models to have limited capacity for generalisation. Furthermore, by employing model interpretability techniques, we shed light on the factors contributing to the generalisation difficulties of the models. To improve the generalisation, we experiment with two techniques, which improve the results on some, but not all the splits, emphasising the need for new techniques.
翻译:当测试数据意外地与训练数据不同时,我们称其为分布外数据,这是机器学习实际应用中常见的挑战。尽管分布外泛化近年来受到关注,但很少有研究聚焦于口语理解任务中的分布外泛化。为促进该领域的研究,我们引入了流行口语理解数据集SLURP的修改版本,该版本包含用于测试口语理解任务中分布外泛化的数据划分。我们将修改后的数据集称为SLURPFOOD。利用我们的分布外数据划分,我们发现端到端口语理解模型的泛化能力有限。此外,通过应用模型可解释性技术,我们揭示了导致模型泛化困难的因素。为提升泛化性能,我们尝试了两种技术,这些技术在某些划分上改善了结果,但并非所有划分,这强调了开发新技术的必要性。