Existing evaluations of foundation models, including recent human-centric approaches, fail to capture what truly matters: user's experience during interaction. Current methods treat evaluation as a matter of output correctness alone, overlooking that user satisfaction emerges from the interplay between response quality and interaction, which limits their ability to account for the mechanisms underlying user experience. To address this gap, we introduce QoNext, the first framework that adapts Quality of Experience (QoE) principles from networking and multimedia to the assessment of foundation models. QoNext identifies experiential factors that shape user experience and incorporates them into controlled experiments, where human ratings are collected under varied configurations. From these studies we construct a QoE-oriented database and train predictive models that estimate perceived user experience from measurable system parameters. Our results demonstrate that QoNext not only enables proactive and fine-grained evaluation but also provides actionable guidance for productized services of optimizing foundation models in practice.
翻译:现有基础模型评估方法(包括近期以人为中心的评估方法)均未能捕捉真正重要的因素:用户交互过程中的体验。当前方法将评估视为单纯的输出正确性问题,忽视了用户满意度产生于响应质量与交互过程之间的相互作用,这限制了其解释用户体验内在机制的能力。为弥补这一不足,我们提出QoNext——首个将网络与多媒体领域的体验质量原则应用于基础模型评估的框架。QoNext识别影响用户体验的感知要素,并将其纳入受控实验设计,通过不同配置收集人工评分数据。基于这些研究,我们构建了面向体验质量的数据库,并训练能够根据可测量系统参数预测用户感知体验的模型。实验结果表明,QoNext不仅能实现主动化、细粒度的评估,还能为实际产品化服务中优化基础模型提供可操作的指导。