Formatting is an important property in tables for visualization, presentation, and analysis. Spreadsheet software allows users to automatically format their tables by writing data-dependent conditional formatting (CF) rules. Writing such rules is often challenging for users as it requires them to understand and implement the underlying logic. We present FormaT5, a transformer-based model that can generate a CF rule given the target table and a natural language description of the desired formatting logic. We find that user descriptions for these tasks are often under-specified or ambiguous, making it harder for code generation systems to accurately learn the desired rule in a single step. To tackle this problem of under-specification and minimise argument errors, FormaT5 learns to predict placeholders though an abstention objective. These placeholders can then be filled by a second model or, when examples of rows that should be formatted are available, by a programming-by-example system. To evaluate FormaT5 on diverse and real scenarios, we create an extensive benchmark of 1053 CF tasks, containing real-world descriptions collected from four different sources. We release our benchmarks to encourage research in this area. Abstention and filling allow FormaT5 to outperform 8 different neural approaches on our benchmarks, both with and without examples. Our results illustrate the value of building domain-specific learning systems.
翻译:格式化是表格在可视化、展示和分析中的重要属性。电子表格软件允许用户通过编写数据相关的条件格式化规则自动格式化表格。然而,编写此类规则对用户而言往往具有挑战性,因为用户需要理解并实现底层逻辑。我们提出FormaT5,一种基于Transformer的模型,能够根据目标表格和所需格式化逻辑的自然语言描述生成条件格式化规则。我们发现在这些任务中,用户的描述往往不明确或存在歧义,这导致代码生成系统难以在单一步骤中准确学习所需规则。为解决这种不明确性问题并减少参数错误,FormaT5通过弃用目标学习预测占位符。这些占位符随后可由第二个模型填充,或在存在应格式化的行示例时,通过编程示例系统完成填充。为在多样且真实的场景中评估FormaT5,我们创建了包含1053个条件格式化任务、涵盖来自四种不同来源的真实世界描述的广泛基准数据集。我们公开该基准数据集以鼓励该领域的研究。弃权与填充机制使FormaT5在我们的基准测试中(无论是否提供示例)均优于8种不同的神经方法。我们的结果突显了构建特定领域学习系统的价值。