This research explores strategies for steering the output of large language models (LLMs) towards specific styles, such as sentiment, emotion, or writing style, by adding style vectors to the activations of hidden layers during text generation. We show that style vectors can be simply computed from recorded layer activations for input texts in a specific style in contrast to more complex training-based approaches. Through a series of experiments, we demonstrate the effectiveness of activation engineering using such style vectors to influence the style of generated text in a nuanced and parameterisable way, distinguishing it from prompt engineering. The presented research constitutes a significant step towards developing more adaptive and effective AI-empowered interactive systems.
翻译:本研究探索通过向文本生成过程中的隐藏层激活添加风格向量,引导大型语言模型(LLMs)输出特定风格(如情感倾向、情绪色彩或写作风格)的策略。我们证明,相较于基于复杂训练的方法,风格向量可直接通过记录特定风格输入文本的逐层激活值计算得出。通过系列实验,我们展示了利用此类风格向量进行激活工程的有效性——该方法能以精细且参数可控的方式影响生成文本的风格特征,从而区别于提示工程。本研究为开发更具适应性与高效性的AI赋能交互系统迈出了关键一步。