Is ChatGPT a Good Software Librarian? An Exploratory Study on the Use of ChatGPT for Software Library Recommendations

Software libraries play a critical role in the functionality, efficiency, and maintainability of software systems. As developers increasingly rely on Large Language Models (LLMs) to streamline their coding processes, the effectiveness of these models in recommending appropriate libraries becomes crucial yet remains largely unexplored. In this paper, we assess the effectiveness of ChatGPT as a software librarian and identify areas for improvement. We conducted an empirical study using GPT-3.5 Turbo to generate Python code for 10,000 Stack Overflow questions. Our findings show that ChatGPT uses third-party libraries nearly 10% more often than human developers, favoring widely adopted and well-established options. However, 14.2% of the recommended libraries had restrictive copyleft licenses, which were not explicitly communicated by ChatGPT. Additionally, 6.5% of the libraries did not work out of the box, leading to potential developer confusion and wasted time. While ChatGPT can be an effective software librarian, it should be improved by providing more explicit information on maintainability metrics and licensing. We recommend that developers implement rigorous dependency management practices and double-check library licenses before integrating LLM-generated code into their projects.

翻译：软件库对于软件系统的功能性、效率与可维护性起着至关重要的作用。随着开发者日益依赖大型语言模型（LLMs）来简化编码流程，这些模型在推荐合适软件库方面的有效性变得至关重要，但相关研究仍极为有限。本文评估了ChatGPT作为软件图书馆员的有效性，并识别了其改进方向。我们基于GPT-3.5 Turbo为10,000个Stack Overflow问题生成Python代码，开展了一项实证研究。研究发现：ChatGPT使用第三方库的频率比人类开发者高出近10%，且倾向于推荐广泛采用且成熟的库。然而，有14.2%的推荐库采用限制性较强的Copyleft许可证，而ChatGPT并未明确提示此信息。此外，6.5%的库无法直接运行，可能导致开发者困惑并浪费时间。尽管ChatGPT可以成为有效的软件图书馆员，但仍需通过提供更明确的可维护性指标与许可证信息来改进。我们建议开发者在将LLM生成的代码集成到项目前，实施严格的依赖管理实践并仔细核查库的许可证。

相关内容

ChatGPT

关注 258

ChatGPT（全名：Chat Generative Pre-trained Transformer），美国OpenAI 研发的聊天机器人程序 [1] ，于2022年11月30日发布。ChatGPT是人工智能技术驱动的自然语言处理工具，它能够通过学习和理解人类的语言来进行对话，还能根据聊天的上下文进行互动，真正像人类一样来聊天交流，甚至能完成撰写邮件、视频脚本、文案、翻译、代码，写论文任务。 [1] https://openai.com/blog/chatgpt/

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【CHI2020-微软】解释可解释性:理解数据科学家使用机器学习的可解释性工具，Interpreting Interpretability: Understanding Data Scientists’Use of Interpretability Tools for Machine Learning

专知会员服务

55+阅读 · 2020年3月8日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日