We propose CHRT (Control Hidden Representation Transformation) - a controlled language generation framework that steers large language models to generate text pertaining to certain attributes (such as toxicity). CHRT gains attribute control by modifying the hidden representation of the base model through learned transformations. We employ a contrastive-learning framework to learn these transformations that can be combined to gain multi-attribute control. The effectiveness of CHRT is experimentally shown by comparing it with seven baselines over three attributes. CHRT outperforms all the baselines in the task of detoxification, positive sentiment steering, and text simplification while minimizing the loss in linguistic qualities. Further, our approach has the lowest inference latency of only 0.01 seconds more than the base model, making it the most suitable for high-performance production environments. We open-source our code and release two novel datasets to further propel controlled language generation research.
翻译:我们提出CHRT(控制隐藏表示变换)——一种可控语言生成框架,能够引导大型语言模型生成具有特定属性(如毒性)的文本。CHRT通过学习得到的变换来修改基础模型的隐藏表示,从而实现对属性的控制。我们采用对比学习框架学习这些变换,并可组合这些变换以实现多属性控制。通过在三项属性上与七个基线方法的对比实验,CHRT的有效性得到验证。在去毒化、积极情感引导和文本简化任务中,CHRT在最大程度减少语言质量损失的同时优于所有基线方法。此外,我们的方法推理延迟最低,仅比基础模型多0.01秒,使其成为高性能生产环境中最适用的方案。我们开源了代码并发布两个新数据集,以进一步推动可控语言生成研究。