"He is a person", "Paris is located on the earth". Both statements are correct but meaningless - due to lack of specificity. In this paper, we propose to measure how specific the language of pre-trained language models (PLMs) is. To achieve this, we introduce a novel approach to build a benchmark for specificity testing by forming masked token prediction tasks with prompts. For instance, given "Toronto is located in [MASK].", we want to test whether a more specific answer will be better filled in by PLMs, e.g., Ontario instead of Canada. From our evaluations, we show that existing PLMs have only a slight preference for more specific answers. We identify underlying factors affecting the specificity and design two prompt-based methods to improve the specificity. Results show that the specificity of the models can be improved by the proposed methods without additional training. We hope this work can bring to awareness the notion of specificity of language models and encourage the research community to further explore this important but understudied problem.
翻译:“他是个人”,“巴黎位于地球上”。这两句话虽然正确但毫无意义——因为缺乏具体性。本文提出衡量预训练语言模型(PLMs)语言具体性的方法。为此,我们引入了一种创新方法,通过构建带提示词的掩码词预测任务来建立具体性测试基准。例如,给定“多伦多位于[MASK]。”时,我们测试PLMs是否能更倾向填入具体答案(如安大略省而非加拿大)。评估表明,现有PLMs对具体答案仅表现出轻微偏好。我们识别了影响具体性的潜在因素,并设计了两种基于提示词的方法来提升具体性。结果显示,所提方法无需额外训练即可提升模型的具体性。希望这项工作能引起学界对语言模型具体性概念的重视,并鼓励研究者进一步探索这个重要但研究不足的问题。