Code example is a crucial part of good documentation. It helps the developers to understand the documentation easily and use the corresponding code unit (e.g., method) properly. However, many official documentation still lacks (good) code example and it is one of the common documentation issues as found by several studies. Hence in this paper, we consider automatic code example generation for documentation, a direction less explored by the existing research. We employ Codex, a GPT-3 based model, pre-trained on both natural and programming languages to generate code examples from source code and documentation given as input. Our preliminary investigation on 40 scikit-learn methods reveals that this approach is able to generate good code examples where 72.5% code examples were executed without error (passability) and 82.5% properly dealt with the target method and documentation (relevance). We also find that incorporation of error logs (produced by the compiler while executing a failed code example) in the input further improves the passability from 72.5% to 87.5%. Thus, our investigation sets the base of documentation-specific code example generation and warrants in-depth future studies.
翻译:代码示例是优秀文档的关键组成部分,它能帮助开发者轻松理解文档并正确使用相应的代码单元(例如方法)。然而,多项研究发现,许多官方文档仍然缺乏(优质的)代码示例,这是常见的文档问题之一。因此,本文探讨了为文档自动生成代码示例这一现有研究较少涉及的方向。我们采用了基于GPT-3的模型Codex,该模型在自然语言和编程语言上进行了预训练,能够根据输入的源代码和文档生成代码示例。在对40个scikit-learn方法进行的初步研究中,我们发现该方法能够生成优质的代码示例,其中72.5%的代码示例可无错误执行(通过率),82.5%的示例正确处理了目标方法和文档(相关性)。我们还发现,将编译器执行失败代码示例时产生的错误日志纳入输入后,通过率从72.5%进一步提升至87.5%。因此,我们的研究为特定文档的代码示例生成奠定了基础,并值得未来进行深入探究。