GenLens: A Systematic Evaluation of Visual GenAI Model Outputs

The rapid development of generative AI (GenAI) models in computer vision necessitates effective evaluation methods to ensure their quality and fairness. Existing tools primarily focus on dataset quality assurance and model explainability, leaving a significant gap in GenAI output evaluation during model development. Current practices often depend on developers' subjective visual assessments, which may lack scalability and generalizability. This paper bridges this gap by conducting a formative study with GenAI model developers in an industrial setting. Our findings led to the development of GenLens, a visual analytic interface designed for the systematic evaluation of GenAI model outputs during the early stages of model development. GenLens offers a quantifiable approach for overviewing and annotating failure cases, customizing issue tags and classifications, and aggregating annotations from multiple users to enhance collaboration. A user study with model developers reveals that GenLens effectively enhances their workflow, evidenced by high satisfaction rates and a strong intent to integrate it into their practices. This research underscores the importance of robust early-stage evaluation tools in GenAI development, contributing to the advancement of fair and high-quality GenAI models.

翻译：生成式人工智能（GenAI）模型在计算机视觉领域的快速发展，亟需有效的评估方法以确保其质量与公平性。现有工具主要聚焦于数据集质量保障与模型可解释性，在模型开发过程中对GenAI输出的评估方面存在显著空白。当前实践往往依赖开发者主观视觉判断，缺乏可扩展性与泛化能力。本文通过一项针对工业界GenAI模型开发者的形成性研究填补了这一空白。研究结果促成了GenLens的研发——一个专为模型开发早期阶段系统性评估GenAI输出而设计的可视化分析界面。GenLens提供了可量化的方法，用于概览与标注失败案例、自定义问题标签与分类，以及聚合多个用户的标注以增强协作。面向模型开发者的用户研究表明，GenLens能有效优化其工作流程，这体现在高满意度及用户强烈意愿将其纳入实践两方面。本研究凸显了在GenAI开发中构建稳健的早期评估工具的重要性，并推动公平、高质量的GenAI模型的发展。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

专知会员服务

66+阅读 · 2023年6月10日

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日