AI assistants can help developers by recommending code to be included in their implementations (e.g., suggesting the implementation of a method from its signature). Although useful, these recommendations may mirror copyleft code available in public repositories, exposing developers to the risk of reusing code that they are allowed to reuse only under certain constraints (e.g., a specific license for the derivative software). This paper presents a large-scale study about the frequency and magnitude of this phenomenon in ChatGPT. In particular, we generate more than 70,000 method implementations using a range of configurations and prompts, revealing that a larger context increases the likelihood of reproducing copyleft code, but higher temperature settings can mitigate this issue.
翻译:AI助手可通过推荐代码供开发者纳入其实现(例如根据方法签名建议其实现)来协助开发。尽管这些推荐具有实用性,它们可能复制公共代码库中受Copyleft保护的代码,使开发者面临仅能在特定约束条件下(例如衍生软件需遵循特定许可)复用代码的风险。本文针对ChatGPT中该现象的发生频率与严重程度展开大规模研究。通过采用多种配置与提示生成超过70,000个方法实现,我们发现:更广泛的上下文会增加复制Copyleft代码的可能性,而更高的温度设置可缓解此问题。