Users are increasingly being warned to check AI-generated content for correctness. Still, as LLMs (and other generative models) generate more complex output, such as summaries, tables, or code, it becomes harder for the user to audit or evaluate the output for quality or correctness. Hence, we are seeing the emergence of tool-assisted experiences to help the user double-check a piece of AI-generated content. We refer to these as co-audit tools. Co-audit tools complement prompt engineering techniques: one helps the user construct the input prompt, while the other helps them check the output response. As a specific example, this paper describes recent research on co-audit tools for spreadsheet computations powered by generative models. We explain why co-audit experiences are essential for any application of generative AI where quality is important and errors are consequential (as is common in spreadsheet computations). We propose a preliminary list of principles for co-audit, and outline research challenges.
翻译:用户日益被提醒需检查AI生成内容的正确性。然而,随着大语言模型(及其他生成模型)生成更复杂的输出(如摘要、表格或代码),用户审查或评估输出质量与正确性的难度也随之增加。因此,能够帮助用户复核AI生成内容的工具辅助体验正在兴起,我们将其称为“共同审计工具”。该类工具与提示工程方法相辅相成:前者帮助用户构建输入提示,后者则协助用户验证输出响应。以具体案例而言,本文描述了近期关于生成模型驱动电子表格计算的共同审计工具研究。我们阐释了为何在质量要求严苛且错误后果严重(如电子表格计算场景)的生成式AI应用中,共同审计体验至关重要。本文提出了一套初步的共同审计原则,并概述了相关研究挑战。