As ChatGPT possesses powerful capabilities in natural language processing and code analysis, it has received widespread attention since its launch. Developers have applied its powerful capabilities to various domains through software projects which are hosted on the largest open-source platform (GitHub) worldwide. Simultaneously, these projects have triggered extensive discussions. In order to comprehend the research content of these projects and understand the potential requirements discussed, we collected ChatGPT-related projects from the GitHub platform and utilized the LDA topic model to identify the discussion topics. Specifically, we selected 200 projects, categorizing them into three primary categories through analyzing their descriptions: ChatGPT implementation & training, ChatGPT application, ChatGPT improvement & extension. Subsequently, we employed the LDA topic model to identify 10 topics from issue texts, and compared the distribution and evolution trend of the discovered topics within the three primary project categories. Our observations include (1) The number of projects growing in a single month for the three primary project categories are closely associated with the development of ChatGPT. (2) There exist significant variations in the popularity of each topic for the three primary project categories. (3) The monthly changes in the absolute impact of each topic for the three primary project categories are diverse, which is often closely associated with the variation in the number of projects owned by that category. (4) With the passage of time, the relative impact of each topic exhibits different development trends in the three primary project categories. Based on these findings, we discuss implications for developers and users.
翻译:由于ChatGPT在自然语言处理与代码分析方面具备强大能力,自发布以来便受到广泛关注。开发者通过全球最大开源平台(GitHub)托管的软件项目,将其强大功能应用于各领域。与此同时,这些项目引发了大量讨论。为理解这些项目的研究内容及潜在需求,我们从GitHub平台收集ChatGPT相关项目,并利用LDA主题模型识别讨论主题。具体而言,我们选取200个项目,通过分析其描述将其分为三类:ChatGPT实现与训练、ChatGPT应用、ChatGPT改进与拓展。随后,采用LDA主题模型从issue文本中识别出10个主题,并对比三类主要项目中识别主题的分布与演化趋势。我们的发现包括:(1) 三类主要项目的月度增长数量与ChatGPT的发展密切相关;(2) 三类主要项目中各主题的热度存在显著差异;(3) 三类主要项目中各主题月度绝对影响力的变化呈现多样化特征,通常与该类别项目数量的变化紧密相关;(4) 随时间推移,三类主要项目中各主题的相对影响力呈现不同的发展趋势。基于这些发现,我们讨论了其对开发者与用户的启示。