While text style transfer has many applications across natural language processing, the core premise of transferring from a single source style is unrealistic in a real-world setting. In this work, we focus on arbitrary style transfer: rewriting a text from an arbitrary, unknown style to a target style. We propose STEER: Unified Style Transfer with Expert Reinforcement, a unified frame-work developed to overcome the challenge of limited parallel data for style transfer. STEER involves automatically generating a corpus of style-transfer pairs using a product of experts during decoding. The generated offline data is then used to pre-train an initial policy before switching to online, off-policy reinforcement learning for further improvements via fine-grained reward signals. STEER is unified and can transfer to multiple target styles from an arbitrary, unknown source style, making it particularly flexible and efficient. Experimental results on a challenging dataset with text from a diverse set of styles demonstrate state-of-the-art results compared to competitive baselines. Remarkably, STEER outperforms the 175B parameter instruction-tuned GPT-3 on overall style transfer quality, despite being 226 times smaller in size. We also show STEER is robust, maintaining its style transfer capabilities on out-of-domain data, and surpassing nearly all baselines across various styles. The success of our method highlights the potential of RL algorithms when augmented with controllable decoding to overcome the challenge of limited data supervision.
翻译:虽然文本风格迁移在自然语言处理领域具有广泛应用,但基于单一源风格迁移的核心前提在实际场景中并不现实。本文聚焦于任意风格迁移:将文本从任意未知风格重写为目标风格。我们提出STEER:基于专家强化学习的统一风格迁移,这是一个为克服风格迁移并行数据有限的挑战而开发的统一框架。STEER在解码阶段通过专家乘积自动生成风格迁移语料对,利用生成的离线数据预训练初始策略,随后切换至在线离线策略强化学习,通过细粒度奖励信号实现进一步优化。STEER具有统一性,可将任意未知源风格文本迁移至多个目标风格,展现出卓越的灵活性与效率。在包含多种风格文本的挑战性数据集上的实验结果表明,与竞争基线相比,该方法达到了最先进水平。值得注意的是,尽管STEER参数量仅为GPT-3的1/226倍,其在整体风格迁移质量上仍优于拥有1750亿参数的指令微调GPT-3。我们还验证了STEER的鲁棒性:在域外数据上仍保持风格迁移能力,并在多种风格场景下全面超越几乎所有基线方法。本方法的成功揭示了强化学习算法在结合可控解码技术时,具有克服数据监督有限挑战的巨大潜力。