Generative AI (GAI) holds great potential to improve software engineering productivity, but its untrustworthy outputs, particularly in code synthesis, pose significant challenges. The need for extensive verification and validation (V&V) of GAI-generated artifacts may undermine the potential productivity gains. This paper proposes a way of mitigating these risks by exploiting GAI's ability to generate multiple versions of code and tests to facilitate comparative analysis across versions. Rather than relying on the quality of a single test or code module, this "differential GAI" (D-GAI) approach promotes more reliable quality evaluation through version diversity. We introduce the Large-Scale Software Observatorium (LASSO), a platform that supports D-GAI by executing and analyzing large sets of code versions and tests. We discuss how LASSO enables rigorous evaluation of GAI-generated artifacts and propose its application in both software development and GAI research.
翻译:生成式人工智能(GAI)在提升软件工程生产力方面具有巨大潜力,但其不可靠的输出(尤其在代码合成领域)带来了严峻挑战。对GAI生成产物进行大量验证与确认(V&V)的需求可能削弱其潜在的生产力增益。本文提出一种通过利用GAI生成多版本代码及测试的能力来缓解此类风险的方法,以促进跨版本的比较分析。这种"差分式GAI"(D-GAI)方法不依赖单一测试或代码模块的质量,而是通过版本多样性实现更可靠的质量评估。我们介绍了大规模软件观测台(LASSO)平台,该平台通过执行与分析海量代码版本及测试来支持D-GAI。我们将探讨LASSO如何实现对GAI生成产物的严格评估,并提议其在软件开发与GAI研究领域的应用前景。