Textual style expresses a diverse set of information, including interpersonal dynamics (e.g., formality) and the author's emotions or attitudes (e.g., disgust). An open question is how language models can be explicitly controlled so that they weave together target styles when generating text: for example, to produce text that is both negative and non-toxic. One approach to such controlled generation is multi-objective reinforcement learning (RL), but how best to combine multiple objectives in a reward function is an open question. In this paper, we investigate various formulations of multi-style rewards, including calibrated outputs from discriminators and dynamic weighting by discriminator gradient magnitudes. We find that our proposed dynamic weighting outperforms static weighting approaches with respect to style control while maintaining linguistic quality, and we explore its effectiveness in 2- and 3-style control.
翻译:文本风格表达多样化的信息,包括人际动态(如正式性)以及作者的情感或态度(如厌恶)。一个开放性问题是如何明确控制语言模型,使其在生成文本时交织目标风格:例如,生成既负面又非毒性的文本。实现此类可控生成的一种方法是多目标强化学习(RL),但如何在奖励函数中最佳组合多个目标仍是一个开放问题。本文研究了多风格奖励的各种形式,包括判别器的校准输出以及通过判别器梯度幅度的动态加权。我们发现,所提出的动态加权方法在风格控制方面优于静态加权方法,同时保持语言质量,并探索了其在2-风格和3-风格控制中的有效性。