As the amount of textual data in various fields, including software development, continues to grow, there is a pressing demand for efficient and effective extraction and presentation of meaningful insights. This paper presents a unique approach to address this need, focusing on the complexities of interpreting Application Programming Interface (API) documentation. While official API documentation serves as a primary source of information for developers, it can often be extensive and lacks user-friendliness. In light of this, developers frequently resort to unofficial sources like Stack Overflow and GitHub. Our novel approach employs the strengths of BERTopic for topic modeling and Natural Language Processing (NLP) to automatically generate summaries of API documentation, thereby creating a more efficient method for developers to extract the information they need. The produced summaries and topics are evaluated based on their performance, coherence, and interoperability. The findings of this research contribute to the field of API documentation analysis by providing insights into recurring topics, identifying common issues, and generating potential solutions. By improving the accessibility and efficiency of API documentation comprehension, our work aims to enhance the software development process and empower developers with practical tools for navigating complex APIs.
翻译:随着软件开发等领域文本数据量的持续增长,如何高效准确地提取和呈现有意义的信息已成为迫切需求。本文提出了一种创新方法,旨在解决应用程序编程接口(API)文档解读中的复杂性难题。虽然官方API文档是开发者获取信息的主要来源,但其内容往往冗长且缺乏用户友好性。因此,开发者常转而使用Stack Overflow和GitHub等非官方资源。我们提出的新颖方法利用BERTopic主题建模与自然语言处理(NLP)的技术优势,自动生成API文档摘要,从而为开发者提供更高效的信息提取途径。所生成的摘要和主题基于性能、连贯性和互操作性进行评估。本研究通过揭示API文档中的高频主题、识别常见问题并生成潜在解决方案,推动API文档分析领域的发展。通过提升API文档理解的可访问性与效率,本工作旨在优化软件开发流程,并为开发者提供应对复杂API的实用工具。