Block-based programming languages like Scratch are increasingly popular for programming education and end-user programming. Recent program analyses build on the insight that source code can be modelled using techniques from natural language processing. Many of the regularities of source code that support this approach are due to the syntactic overhead imposed by textual programming languages. This syntactic overhead, however, is precisely what block-based languages remove in order to simplify programming. Consequently, it is unclear how well this modelling approach performs on block-based programming languages. In this paper, we investigate the applicability of language models for the popular block-based programming language Scratch. We model Scratch programs using n-gram models, the most essential type of language model, and transformers, a popular deep learning model. Evaluation on the example tasks of code completion and bug finding confirm that blocks inhibit predictability, but the use of language models is nevertheless feasible. Our findings serve as foundation for improving tooling and analyses for block-based languages.
翻译:像Scratch这样的基于块的编程语言在编程教育和最终用户编程中日益流行。近期的程序分析基于这样一种见解:可以使用自然语言处理技术对源代码进行建模。支持这种方法的许多源代码规律性源于文本编程语言所施加的语法开销。然而,正是这种语法开销被基于块的语言所移除,以简化编程。因此,这种建模方法在基于块的编程语言上的表现尚不清楚。在本文中,我们研究了语言模型在流行的基于块的编程语言Scratch中的适用性。我们使用n-gram模型(最基本的语言模型类型)和Transformer(一种流行的深度学习模型)对Scratch程序进行建模。在代码补全和错误查找示例任务上的评估证实,块会抑制可预测性,但使用语言模型仍然是可行的。我们的发现为改进基于块的语言的工具和分析奠定了基础。