The Bidirectional Encoder Representations from Transformers (BERT) were proposed in the natural language process (NLP) and shows promising results. Recently researchers applied the BERT to source-code representation learning and reported some good news on several downstream tasks. However, in this paper, we illustrated that current methods cannot effectively understand the logic of source codes. The representation of source code heavily relies on the programmer-defined variable and function names. We design and implement a set of experiments to demonstrate our conjecture and provide some insights for future works.
翻译:来自Transformer的双向编码器表示(BERT)最初在自然语言处理(NLP)领域提出并展现出良好的效果。近期研究者将BERT应用于源代码表示学习,并在若干下游任务中取得了积极成果。然而本文指出,现有方法无法有效理解源代码的逻辑结构。源代码的表示高度依赖于程序员定义的变量名和函数名。我们设计并实施了一系列实验来验证这一猜想,并为未来研究提供参考启示。