API documentation is crucial for developers to learn and use APIs. However, it is known that many official API documents are obsolete and incomplete. To address this challenge, we propose a new approach called AutoDoc that generates API documents with API knowledge extracted from online discussions on Stack Overflow (SO). AutoDoc leverages a fine-tuned dense retrieval model to identify seven types of API knowledge from SO posts. Then, it uses GPT-4o to summarize the API knowledge in these posts into concise text. Meanwhile, we designed two specific components to handle LLM hallucination and redundancy in generated content. We evaluated AutoDoc against five comparison baselines on 48 APIs of different popularity levels. Our results indicate that the API documents generated by AutoDoc are up to 77.7% more accurate, 9.5% less duplicated, and contain 34.4% knowledge uncovered by the official documents. We also measured the sensitivity of AutoDoc to the choice of different LLMs. We found that while larger LLMs produce higher-quality API documents, AutoDoc enables smaller open-source models (e.g., Mistral-7B-v0.3) to achieve comparable results. Finally, we conducted a user study to evaluate the usefulness of the API documents generated by AutoDoc. All participants found API documents generated by AutoDoc to be more comprehensive, concise, and helpful than the comparison baselines. This highlights the feasibility of utilizing LLMs for API documentation with careful design to counter LLM hallucination and information redundancy.
翻译:API文档对于开发者学习和使用API至关重要。然而,众所周知,许多官方API文档存在过时和不完整的问题。为应对这一挑战,我们提出了一种名为AutoDoc的新方法,该方法通过从Stack Overflow(SO)在线讨论中提取API知识来生成API文档。AutoDoc利用微调的密集检索模型从SO帖子中识别七类API知识,随后使用GPT-4o将这些帖子中的API知识总结为简洁文本。同时,我们设计了两个专用组件来处理生成内容中的大语言模型幻觉和冗余问题。我们在48个不同流行度等级的API上对AutoDoc与五种基线方法进行了评估。结果表明,AutoDoc生成的API文档准确率最高提升77.7%,重复率降低9.5%,且包含34.4%官方文档未覆盖的知识。我们还测量了AutoDoc对不同大语言模型选择的敏感性,发现虽然更大规模的大语言模型能生成更高质量的API文档,但AutoDoc能使较小的开源模型(如Mistral-7B-v0.3)达到可比的结果。最后,我们通过用户研究评估了AutoDoc生成API文档的实用性。所有参与者均认为,与基线方法相比,AutoDoc生成的API文档更全面、简洁且实用。这凸显了通过精心设计应对大语言模型幻觉和信息冗余问题,利用大语言模型生成API文档的可行性。