The ability to use inductive reasoning to extract general rules from multiple observations is a vital indicator of intelligence. As humans, we use this ability to not only interpret the world around us, but also to predict the outcomes of the various interactions we experience. Generalising over multiple observations is a task that has historically presented difficulties for machines to grasp, especially when requiring computer vision. In this paper, we propose a model that can extract general rules from video demonstrations by simultaneously performing summarisation and translation. Our approach differs from prior works by framing the problem as a multi-sequence-to-sequence task, wherein summarisation is learnt by the model. This allows our model to utilise edge cases that would otherwise be suppressed or discarded by traditional summarisation techniques. Additionally, we show that our approach can handle noisy specifications without the need for additional filtering methods. We evaluate our model by synthesising programs from video demonstrations in the Vizdoom environment achieving state-of-the-art results with a relative increase of 11.75% program accuracy on prior works
翻译:从多个观察中利用归纳推理提取一般规则的能力是智能的重要指标。作为人类,我们不仅利用这种能力解释周围的世界,还预测我们所经历的各种互动的结果。在多个观察上进行泛化是一项历史上机器难以掌握的任务,尤其是在需要计算机视觉时。在本文中,我们提出了一种模型,能够通过同时执行摘要和翻译,从视频演示中提取一般规则。我们的方法与先前工作不同,将问题框架化为多序列到序列任务,其中摘要由模型学习。这使得我们的模型能够利用传统摘要技术可能会抑制或丢弃的边缘案例。此外,我们展示了我们的方法可以在无需额外过滤方法的情况下处理噪声规范。我们通过在Vizdoom环境中从视频演示合成程序来评估我们的模型,取得了最先进的结果,程序准确率相对先前工作提高了11.75%。