In this work we study user controlled table-to-text generation where users explore the content in a table by selecting cells and reading a natural language description thereof automatically produce by a natural language generator. Such generation models usually learn from carefully selected cell combinations (clean cell selections); however, in practice users may select unexpected, redundant, or incoherent cell combinations (noisy cell selections). In experiments, we find that models perform well on test sets coming from the same distribution as the train data but their performance drops when evaluated on realistic noisy user inputs. We propose a fine-tuning regime with additional user-simulated noisy cell selections. Models fine-tuned with the proposed regime gain 4.85 BLEU points on user noisy test cases and 1.4 on clean test cases; and achieve comparable state-of-the-art performance on the ToTTo dataset.
翻译:在本研究中,我们探讨了用户控制的表格到文本生成任务,其中用户通过选择单元格并阅读由自然语言生成器自动生成的相应自然语言描述来探索表格内容。此类生成模型通常从精心挑选的单元格组合(干净单元格选择)中学习;然而,在实际应用中,用户可能选择意外、冗余或不连贯的单元格组合(噪声单元格选择)。实验发现,模型在与训练数据同分布的测试集上表现良好,但在评估真实的噪声用户输入时性能下降。我们提出了一种微调策略,额外引入模拟用户噪声单元格选择的数据。采用该策略微调的模型在用户噪声测试用例上BLEU分数提升4.85,在干净测试用例上提升1.4,并在ToTTo数据集上达到了与当前最优方法相当的性能。