This paper introduces a new approach to generating strongly constrained texts. We consider standardized sentence generation for the typical application of vision screening. To solve this problem, we formalize it as a discrete combinatorial optimization problem and utilize multivalued decision diagrams (MDD), a well-known data structure to deal with constraints. In our context, one key strength of MDD is to compute an exhaustive set of solutions without performing any search. Once the sentences are obtained, we apply a language model (GPT-2) to keep the best ones. We detail this for English and also for French where the agreement and conjugation rules are known to be more complex. Finally, with the help of GPT-2, we get hundreds of bona-fide candidate sentences. When compared with the few dozen sentences usually available in the well-known vision screening test (MNREAD), this brings a major breakthrough in the field of standardized sentence generation. Also, as it can be easily adapted for other languages, it has the potential to make the MNREAD test even more valuable and usable. More generally, this paper highlights MDD as a convincing alternative for constrained text generation, especially when the constraints are hard to satisfy, but also for many other prospects.
翻译:本文提出了一种生成强约束文本的新方法。我们针对视觉筛查这一典型应用场景,研究标准化句子的生成问题。为了求解该问题,我们将其形式化为离散组合优化问题,并利用多值决策图(MDD)这一处理约束的经典数据结构。在我们的场景中,MDD的关键优势在于无需执行任何搜索即可计算出完备的解集。在获得句子后,我们应用语言模型(GPT-2)筛选出最优句子。我们分别针对英语和法语进行了详细说明,其中法语的性数配合与动词变位规则更为复杂。最终借助GPT-2,我们获得了数百个合格的候选句子。与著名的视觉筛查测试(MNREAD)通常仅有的几十个句子相比,这为标准化句子生成领域带来了重大突破。同时,由于该方法易于适配其他语言,它有望使MNREAD测试更具价值与实用性。更广泛而言,本文凸显了MDD作为约束文本生成的可靠替代方案——尤其当约束条件难以满足时——并展现了其广阔的应用前景。