Spreadsheets are widely recognized as the most popular end-user programming tools, which blend the power of formula-based computation, with an intuitive table-based interface. Today, spreadsheets are used by billions of users to manipulate tables, most of whom are neither database experts nor professional programmers. Despite the success of spreadsheets, authoring complex formulas remains challenging, as non-technical users need to look up and understand non-trivial formula syntax. To address this pain point, we leverage the observation that there is often an abundance of similar-looking spreadsheets in the same organization, which not only have similar data, but also share similar computation logic encoded as formulas. We develop an Auto-Formula system that can accurately predict formulas that users want to author in a target spreadsheet cell, by learning and adapting formulas that already exist in similar spreadsheets, using contrastive-learning techniques inspired by "similar-face recognition" from compute vision. Extensive evaluations on over 2K test formulas extracted from real enterprise spreadsheets show the effectiveness of Auto-Formula over alternatives. Our benchmark data is available at https://github.com/microsoft/Auto-Formula to facilitate future research.
翻译:电子表格被广泛认为是最流行的终端用户编程工具,它将基于公式的计算能力与直观的表格界面相结合。如今,数十亿用户使用电子表格操作表格,其中大多数人既不是数据库专家,也不是专业程序员。尽管电子表格取得了成功,但编写复杂公式仍然具有挑战性,因为非技术用户需要查找并理解不简单的公式语法。为解决这一痛点,我们观察到同一组织中通常存在大量外观相似的电子表格,它们不仅具有相似的数据,还共享编码为公式的相似计算逻辑。我们开发了Auto-Formula系统,该系统通过学习和改编相似电子表格中已有的公式,利用受计算机视觉中“相似人脸识别”启发的对比学习技术,准确预测用户希望在目标电子表格单元格中编写的公式。对从真实企业电子表格中提取的2000多个测试公式进行的广泛评估表明,Auto-Formula相比其他方法具有有效性。我们的基准数据可在https://github.com/microsoft/Auto-Formula获取,以促进未来研究。