To answer a question, language models often need to integrate prior knowledge learned during pretraining and new information presented in context. We hypothesize that models perform this integration in a predictable way across different questions and contexts: models will rely more on prior knowledge for questions about entities (e.g., persons, places, etc.) that they are more familiar with due to higher exposure in the training corpus, and be more easily persuaded by some contexts than others. To formalize this problem, we propose two mutual information-based metrics to measure a model's dependency on a context and on its prior about an entity: first, the persuasion score of a given context represents how much a model depends on the context in its decision, and second, the susceptibility score of a given entity represents how much the model can be swayed away from its original answer distribution about an entity. We empirically test our metrics for their validity and reliability. Finally, we explore and find a relationship between the scores and the model's expected familiarity with an entity, and provide two use cases to illustrate their benefits.
翻译:为回答问题,语言模型通常需要整合预训练期间习得的先验知识与上下文提供的新信息。我们假设模型在不同问题与语境中以可预测的方式执行这种整合:对于训练语料中接触频率更高、因而更熟悉的实体(如人物、地点等),模型将更依赖先验知识;同时某些上下文比其他语境更具说服力。为形式化该问题,我们提出两个基于互信息的指标来衡量模型对上下文及实体先验的依赖程度:其一,特定上下文的"说服力分数"反映模型决策中对语境的依赖程度;其二,特定实体的"易感性分数"表征模型原有实体答案分布被改变的程度。我们通过实证检验了指标的效度与信度。最后,我们探索并发现了这些分数与模型对实体预期熟悉度之间的关联,并通过两个用例说明其应用价值。