QoNext：迈向下一代基础模型体验质量 (QoNext: Towards Next-generation QoE for Foundation Models)

Existing evaluations of foundation models, including recent human-centric approaches, fail to capture what truly matters: user's experience during interaction. Current methods treat evaluation as a matter of output correctness alone, overlooking that user satisfaction emerges from the interplay between response quality and interaction, which limits their ability to account for the mechanisms underlying user experience. To address this gap, we introduce QoNext, the first framework that adapts Quality of Experience (QoE) principles from networking and multimedia to the assessment of foundation models. QoNext identifies experiential factors that shape user experience and incorporates them into controlled experiments, where human ratings are collected under varied configurations. From these studies we construct a QoE-oriented database and train predictive models that estimate perceived user experience from measurable system parameters. Our results demonstrate that QoNext not only enables proactive and fine-grained evaluation but also provides actionable guidance for productized services of optimizing foundation models in practice.

翻译：现有基础模型评估方法（包括近期以人为中心的评估方法）均未能捕捉真正重要的因素：用户交互过程中的体验。当前方法将评估视为单纯的输出正确性问题，忽视了用户满意度产生于响应质量与交互过程之间的相互作用，这限制了其解释用户体验内在机制的能力。为弥补这一不足，我们提出QoNext——首个将网络与多媒体领域的体验质量原则应用于基础模型评估的框架。QoNext识别影响用户体验的感知要素，并将其纳入受控实验设计，通过不同配置收集人工评分数据。基于这些研究，我们构建了面向体验质量的数据库，并训练能够根据可测量系统参数预测用户感知体验的模型。实验结果表明，QoNext不仅能实现主动化、细粒度的评估，还能为实际产品化服务中优化基础模型提供可操作的指导。

相关内容

MoDELS

关注 44

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日