While most research on controllable text generation has focused on steering base Language Models, the emerging instruction-tuning and prompting paradigm offers an alternate approach to controllability. We compile and release ConGenBench, a testbed of 17 different controllable generation tasks, using a subset of it to benchmark the performance of 9 different baselines and methods on Instruction-tuned Language Models. To our surprise, we find that prompting-based approaches outperform controllable text generation methods on most datasets and tasks, highlighting a need for research on controllable text generation with Instruction-tuned Language Models in specific. Prompt-based approaches match human performance on most stylistic tasks while lagging on structural tasks, foregrounding a need to study more varied constraints and more challenging stylistic tasks. To facilitate such research, we provide an algorithm that uses only a task dataset and a Large Language Model with in-context capabilities to automatically generate a constraint dataset. This method eliminates the fields dependence on pre-curated constraint datasets, hence vastly expanding the range of constraints that can be studied in the future.
翻译:尽管大多数关于可控文本生成的研究集中于引导基础语言模型,新兴的指令调优与提示范式为可控性提供了另一种途径。我们整理并发布了ConGenBench测试平台,涵盖17项不同可控生成任务,并利用其子集对9种不同基线和方法在指令调优语言模型上的性能进行了基准测试。令人惊讶的是,我们发现基于提示的方法在大多数数据集和任务上超越了可控文本生成方法,这凸显了针对指令调优语言模型开展可控文本生成研究的必要性。基于提示的方法在大多数文体任务上达到人类水平,但在结构任务上仍显不足,这要求我们研究更多样化的约束条件及更具挑战性的文体任务。为促进此类研究,我们提出一种算法,仅需任务数据集和具备上下文能力的大语言模型即可自动生成约束数据集。该方法消除了该领域对预构建约束数据集的依赖,从而极大地扩展了未来可研究的约束范围。