Define, Evaluate, and Improve Task-Oriented Cognitive Capabilities for Instruction Generation Models

Recent work studies the cognitive capabilities of language models through psychological tests designed for humans. While these studies are helpful for understanding the general capabilities of these models, there is no guarantee that a model possessing sufficient capabilities to pass those tests would actually use those capabilities in performing real-life tasks. In this work, we formulate task-oriented cognitive capabilities, which are human-like cognitive capabilities that language models leverage to perform tasks. These capabilities are (i) the ability to quickly generate good candidate utterances (the search capability) (ii) the ability to predict how a listener interprets those utterances and choose the most appropriate one (the pragmatic capability). We design an evaluation scheme for comparing these capabilities of a language model with those of a human. Applying this scheme to examine various models in a navigation instruction generation problem, we find that their pragmatic capability is severely lacking. This insight leads us to augment them with better models of the listener and obtain a significant boost of 11% in success rate in guiding real humans. Our work advocates for having a principled procedure for aligning language models with humans that involves (i) formulating task-oriented capabilities, (ii) devising a method to quantify their deficiency, and (iii) iteratively improving them.

翻译：近期研究通过专为人类设计的心理测试来探究语言模型的认知能力。尽管这些研究有助于理解模型的通用能力，但并不能保证通过测试的模型在实际任务中会运用这些能力。本研究提出面向任务的认知能力——即语言模型在执行任务时运用的类人认知能力，具体包括：(i) 快速生成优质候选表述的能力（搜索能力）；(ii) 预测听者如何理解这些表述并选择最恰当表述的能力（语用能力）。我们设计了一套评估方案，用于比较语言模型与人类在这两种能力上的表现。将该方案应用于导航指令生成问题中的多个模型时，发现其语用能力严重不足。这一发现促使我们为模型增强更优的听者建模能力，从而在引导真实人类的任务中实现了11%的成功率显著提升。本研究倡导建立一套规范的流程来对齐语言模型与人类认知，具体包括：(i) 明确面向任务的能力类型，(ii) 设计量化能力缺陷的方法，(iii) 通过迭代优化来弥补缺陷。

相关内容

Cognition

关注 4

Cognition：Cognition：International Journal of Cognitive Science Explanation：认知：国际认知科学杂志。 Publisher：Elsevier。 SIT： http://www.journals.elsevier.com/cognition/

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

15+阅读 · 2019年10月23日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日