To meet the requirements of real-world applications, it is essential to control generations of large language models (LLMs). Prior research has tried to introduce reinforcement learning (RL) into controllable text generation while most existing methods suffer from overfitting issues (finetuning-based methods) or semantic collapse (post-processing methods). However, current RL methods are generally guided by coarse-grained (sentence/paragraph-level) feedback, which may lead to suboptimal performance owing to semantic twists or progressions within sentences. To tackle that, we propose a novel reinforcement learning algorithm named TOLE which formulates TOken-LEvel rewards for controllable text generation, and employs a "first-quantize-then-noise" paradigm to enhance the robustness of the RL algorithm.Furthermore, TOLE can be flexibly extended to multiple constraints with little computational expense. Experimental results show that our algorithm can achieve superior performance on both single-attribute and multi-attribute control tasks. We have released our codes at https://github.com/WindyLee0822/CTG
翻译:为满足实际应用需求,控制大语言模型(LLMs)的生成内容至关重要。先前研究尝试将强化学习(RL)引入可控文本生成,但现有方法大多存在过拟合问题(基于微调的方法)或语义崩溃问题(后处理方法)。然而,当前RL方法通常受粗粒度(句子/段落级)反馈引导,因句子内部的语义转折或演进可能导致次优性能。为此,我们提出一种名为TOLE的新型强化学习算法,该算法为可控文本生成构建Token级奖励,并采用"先量化后加噪"范式增强RL算法的鲁棒性。此外,TOLE能以极低计算开销灵活扩展至多约束场景。实验结果表明,我们的算法在单属性和多属性控制任务上均能取得优越性能。我们已在https://github.com/WindyLee0822/CTG 发布代码。