With large language models, robots can understand language more flexibly and more capable than ever before. This survey reviews recent literature and situates it into a spectrum with two poles: 1) mapping between language and some manually defined formal representation of meaning, and 2) mapping between language and high-dimensional vector spaces that translate directly to low-level robot policy. Using a formal representation allows the meaning of the language to be precisely represented, limits the size of the learning problem, and leads to a framework for interpretability and formal safety guarantees. Methods that embed language and perceptual data into high-dimensional spaces avoid this manually specified symbolic structure and thus have the potential to be more general when fed enough data but require more data and computing to train. We discuss the benefits and tradeoffs of each approach and finish by providing directions for future work that achieves the best of both worlds.
翻译:借助大语言模型,机器人对语言的理解能力比以往更加灵活和强大。本文综述了近期相关文献,并将其置于一个两极化的研究谱系中进行考察:1)语言与人工定义的形式化语义表示之间的映射方法,2)语言与可直接转化为底层机器人策略的高维向量空间之间的映射方法。采用形式化表示能够精确表征语言含义,限制学习问题的规模,并形成可解释性框架与形式化安全保证。而将语言和感知数据嵌入高维空间的方法则避免了这种人工设定的符号结构,因此在获得充足数据时可能具备更强的泛化能力,但需要更多的数据和计算资源进行训练。我们探讨了两种方法的优势与权衡,最后提出了融合双方优势的未来研究方向。