Sub-Task Decomposition Enables Learning in Sequence to Sequence Tasks

The field of Natural Language Processing has experienced a dramatic leap in capabilities with the recent introduction of huge Language Models. Despite this success, natural language problems that involve several compounded steps are still practically unlearnable, even by the largest LMs. This complies with experimental failures for end-to-end learning of composite problems that were demonstrated in a variety of domains. An effective mitigation is to introduce intermediate supervision for solving sub-tasks of the compounded problem. Recently, several works have demonstrated high gains by taking a straightforward approach for incorporating intermediate supervision in compounded natural language problems: the sequence-to-sequence LM is fed with an augmented input, in which the decomposed tasks' labels are simply concatenated to the original input. In this paper, we prove a positive learning result that motivates these recent efforts. We show that when concatenating intermediate supervision to the input and training a sequence-to-sequence model on this modified input, unlearnable composite problems can become learnable. We show that this is true for any family of tasks which on the one hand, are unlearnable, and on the other hand, can be decomposed into a polynomial number of simple sub-tasks, each of which depends only on O(1) previous sub-task results. Beyond motivating contemporary empirical efforts for incorporating intermediate supervision in sequence-to-sequence language models, our positive theoretical result is the first of its kind in the landscape of results on the benefits of intermediate supervision for neural-network learning: Until now, all theoretical results on the subject are negative, i.e., show cases where learning is impossible without intermediate supervision, while our result is positive, showing that learning is facilitated in the presence of intermediate supervision.

翻译：自然语言处理领域因近期大规模语言模型的引入而实现了能力上的巨大飞跃。尽管取得了这一成功，涉及多个复合步骤的自然语言问题实际上仍然难以学习，即便是最大的语言模型也不例外。这符合在不同领域展示的复合问题端到端学习的实验性失败案例。一种有效的缓解措施是为解决复合问题的子任务引入中间监督。近期，多项研究通过一种直接的方法，在复合自然语言问题中引入中间监督，取得了显著成效：在序列到序列语言模型中，将分解后的任务标签直接拼接至原始输入，形成增强输入。本文证明了一项积极的结论，这为近期相关努力提供了理论动机。我们表明，当将中间监督信息拼接至输入并在此修改后的输入上训练序列到序列模型时，原本不可学习的复合问题可能变得可学习。这一结论适用于任何任务族：它们一方面不可学习，另一方面可被分解为多项式数量的简单子任务，每个子任务仅依赖于O(1)个先前子任务的结果。本文不仅为当前在序列到序列语言模型中融入中间监督的实证研究提供了理论支持，而且我们的正面理论成果在关于中间监督对神经网络学习益处的结论中尚属首次：迄今为止，该主题的所有理论结论均为负面，即展示了无中间监督时学习不可能的案例，而我们的结论为正面，表明中间监督的存在促进了学习。