Documentation debt hinders the effective utilization of open-source software. Although code summarization tools have been helpful for developers, most would prefer a detailed account of each parameter in a function rather than a high-level summary. However, generating such a summary is too intricate for a single generative model to produce reliably due to the lack of high-quality training data. Thus, we propose a multi-step approach that combines multiple task-specific models, each adept at producing a specific section of a docstring. The combination of these models ensures the inclusion of each section in the final docstring. We compared the results from our approach with existing generative models using both automatic metrics and a human-centred evaluation with 17 participating developers, which proves the superiority of our approach over existing methods.
翻译:文档债务阻碍了开源软件的有效利用。尽管代码摘要工具对开发者有所帮助,但大多数开发者更倾向于获取函数中每个参数的详细说明,而非高层级的摘要。然而,由于缺乏高质量的训练数据,仅靠单一生成模型难以可靠地生成此类详细说明。因此,我们提出了一种结合多个任务特定模型的多步骤方法——每个模型擅长生成文档字符串的特定部分。这些模型的组合确保了最终文档字符串中各部分的完整性。我们通过自动评估指标以及一项包含17名开发人员参与的人本评估,将我们的方法结果与现有生成模型进行了比较,结果证明了该方法优于现有方法的优越性。