Activation steering methods were shown to be effective in conditioning language model generation by additively intervening over models' intermediate representations. However, the evaluation of these techniques has so far been limited to single conditioning properties and synthetic settings. In this work, we conduct a comprehensive evaluation of various activation steering strategies, highlighting the property-dependent nature of optimal parameters to ensure a robust effect throughout generation. To address this issue, we propose Dynamic Activation Composition, an information-theoretic approach to modulate the steering intensity of one or more properties throughout generation. Our experiments on multi-property steering show that our method successfully maintains high conditioning while minimizing the impact of conditioning on generation fluency.
翻译:激活调控方法通过加法干预语言模型的中间表示,已被证明能有效调节语言模型的生成。然而,迄今为止,这些技术的评估仅限于单一调控属性和合成场景。在本工作中,我们对多种激活调控策略进行了全面评估,揭示了最优参数的属性依赖性,以确保调控效果在整个生成过程中的鲁棒性。为解决此问题,我们提出了动态激活组合,这是一种基于信息论的方法,用于在整个生成过程中调节一个或多个属性的调控强度。我们在多属性调控上的实验表明,该方法在保持高调控成功率的同时,最大限度地减少了调控对生成流畅性的影响。