Writing a readme is a crucial aspect of software development as it plays a vital role in managing and reusing program code. Though it is a pain point for many developers, automatically creating one remains a challenge even with the recent advancements in large language models (LLMs), because it requires generating abstract description from thousands of lines of code. In this demo paper, we show that LLMs are capable of generating a coherent and factually correct readmes if we can identify a code fragment that is representative of the repository. Building upon this finding, we developed LARCH (LLM-based Automatic Readme Creation with Heuristics) which leverages representative code identification with heuristics and weak supervision. Through human and automated evaluations, we illustrate that LARCH can generate coherent and factually correct readmes in the majority of cases, outperforming a baseline that does not rely on representative code identification. We have made LARCH open-source and provided a cross-platform Visual Studio Code interface and command-line interface, accessible at https://github.com/hitachi-nlp/larch . A demo video showcasing LARCH's capabilities is available at https://youtu.be/ZUKkh5ED-O4 .
翻译:摘要:编写README是软件开发中的关键环节,因其在程序代码管理与复用中发挥着重要作用。尽管这对许多开发者而言是痛点,但即使在大语言模型(LLM)取得最新进展的背景下,自动生成README仍是一项挑战——这需要从数千行代码中提炼出抽象描述。在本演示论文中,我们证明若能识别出代表代码库的核心代码片段,LLM能够生成连贯且事实正确的README。基于这一发现,我们开发了LARCH(基于大语言模型与启发式规则的自动README生成系统),该系统通过启发式规则与弱监督方法实现代表性代码识别。通过人工与自动化评估,我们验证了LARCH在多数情况下可生成连贯且事实正确的README,其效果优于未采用代表性代码识别的基线方法。我们已将LARCH开源,并提供跨平台Visual Studio Code界面与命令行界面,访问地址为https://github.com/hitachi-nlp/larch 。LARCH功能演示视频见https://youtu.be/ZUKkh5ED-O4。