When engaging in collaborative tasks, humans efficiently exploit the semantic structure of a conversation to optimize verbal and nonverbal interactions. But in recent "language to code" or "language to action" models, this information is lacking. We show how incorporating the prior discourse and nonlinguistic context of a conversation situated in a nonlinguistic environment can improve the "language to action" component of such interactions. We fine tune an LLM to predict actions based on prior context; our model, NeBuLa, doubles the net-action F1 score over the baseline on this task of Jayannavar et al.(2020). We also investigate our model's ability to construct shapes and understand location descriptions using a synthetic dataset.
翻译:在进行协作任务时,人类能够高效利用对话的语义结构来优化言语与非言语交互。然而,在近期的“语言到代码”或“语言到行动”模型中,此类信息往往缺失。本文展示了在非语言环境中,如何通过融入对话的先前话语与非语言上下文来改进此类交互中的“语言到行动”模块。我们通过微调一个大语言模型(LLM)使其能够基于先前上下文预测行动;我们的模型NeBuLa在Jayannavar等人(2020)提出的任务上,其净行动F1分数较基线提升了一倍。此外,我们还利用合成数据集探究了模型在构建形状和理解位置描述方面的能力。