Autoencoders as Tools for Program Synthesis

Recently there have been many advances in research on language modeling of source code. Applications range from code suggestion and completion to code summarization. However, complete program synthesis of industry-grade programming languages has not been researched extensively. In this work, we introduce a variational autoencoder model for program synthesis of industry-grade programming languages. Our model incorporates the internal hierarchical structure of source codes and operates on parse trees. By learning a latent representation of source code over trees, we capture more information and achieve a higher performance than standard autoregressive autoencoder models. Furthermore, due to the tree-structured nature of our model, the autoregressive operations are performed on paths of trees instead of linear sequences. Therefore, the size of the sequences that the autoregressive model processes, scales proportionally to the width and depth of the tree instead of the total size of the tree which mitigates the common problem of exploding and vanishing gradients.

翻译：最近,在源代码的语言建模研究方面取得了许多进展。应用范围从代码建议和完成到代码汇总等,但行业级编程语言的完整程序合成尚未广泛研究。在这项工作中,我们引入了工业级编程语言程序合成的可变自动编码模型模型。我们的模型结合了源代码的内部等级结构,并在剖析树上操作。通过在树上学习源代码的潜在代表,我们获取了更多的信息,并取得了比标准的自动递增自动编码模型更高的性能。此外,由于我们模型的树结构性质,自动递增操作是在树道上而不是直线序列上进行的。因此,自动递增模型过程的顺序大小与树的宽度和深度成比例,而不是减缓爆炸和消失梯度这一常见问题的树的总尺寸。

相关内容

自编码器

关注 141

自动编码器是一种人工神经网络，用于以无监督的方式学习有效的数据编码。自动编码器的目的是通过训练网络忽略信号“噪声”来学习一组数据的表示（编码），通常用于降维。与简化方面一起，学习了重构方面，在此，自动编码器尝试从简化编码中生成尽可能接近其原始输入的表示形式，从而得到其名称。基本模型存在几种变体，其目的是迫使学习的输入表示形式具有有用的属性。自动编码器可有效地解决许多应用问题，从面部识别到获取单词的语义。

最新《注意力机制》教程，112页ppt

专知会员服务

326+阅读 · 2020年11月24日

最新《时序分类:深度序列模型》教程，172页ppt

专知会员服务

43+阅读 · 2020年11月11日

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

55+阅读 · 2020年9月7日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日