CoS: Enhancing Personalization and Mitigating Bias with Context Steering

When querying a large language model (LLM), the context, i.e. personal, demographic, and cultural information specific to an end-user, can significantly shape the response of the LLM. For example, asking the model to explain Newton's second law with the context "I am a toddler" yields a different answer compared to the context "I am a physics professor." Proper usage of the context enables the LLM to generate personalized responses, whereas inappropriate contextual influence can lead to stereotypical and potentially harmful generations (e.g. associating "female" with "housekeeper"). In practice, striking the right balance when leveraging context is a nuanced and challenging problem that is often situation-dependent. One common approach to address this challenge is to fine-tune LLMs on contextually appropriate responses. However, this approach is expensive, time-consuming, and not controllable for end-users in different situations. In this work, we propose Context Steering (CoS) - a simple training-free method that can be easily applied to autoregressive LLMs at inference time. By measuring the contextual influence in terms of token prediction likelihood and modulating it, our method enables practitioners to determine the appropriate level of contextual influence based on their specific use case and end-user base. We showcase a variety of applications of CoS including amplifying the contextual influence to achieve better personalization and mitigating unwanted influence for reducing model bias. In addition, we show that we can combine CoS with Bayesian Inference to quantify the extent of hate speech on the internet. We demonstrate the effectiveness of CoS on state-of-the-art LLMs and benchmarks.

翻译：在查询大型语言模型（LLM）时，上下文（即最终用户特定的个人、人口统计及文化信息）会显著影响LLM的响应。例如，在上下文“我是一个幼儿”的设定下要求模型解释牛顿第二定律，与上下文“我是一名物理学教授”时得到的答案截然不同。恰当利用上下文可使LLM生成个性化响应，而不当的上下文影响则可能导致刻板印象及潜在有害的生成内容（例如将“女性”与“家政人员”关联）。在实践中，如何恰当地平衡上下文利用是一个微妙且具有挑战性的问题，通常依具体情境而定。应对这一挑战的常见方法是在上下文适宜的响应上对LLM进行微调，但该方法成本高昂、耗时且难以针对不同情境下的最终用户实现可控调节。本研究提出上下文引导（CoS）——一种简单的免训练方法，可在推理时轻松应用于自回归LLM。通过以词元预测似然度量化上下文影响并对其进行调节，本方法使实践者能够根据具体应用场景和最终用户群体确定适宜的上下文影响程度。我们展示了CoS的多种应用，包括增强上下文影响以实现更优个性化，以及抑制不良影响以降低模型偏见。此外，我们还证明可将CoS与贝叶斯推断相结合，以量化互联网仇恨言论的传播程度。我们在前沿LLM与基准测试中验证了CoS的有效性。