Estimating the overall user experience (UX) on a device is a common challenge faced by manufacturers. Today, device makers primarily rely on microbenchmark scores, such as Geekbench, that stress test specific hardware components, such as CPU or RAM, but do not satisfactorily capture consumer workloads. System designers often rely on domain-specific heuristics and extensive testing of prototypes to reach a desired UX goal, and yet there is often a mismatch between the manufacturers' performance claims and the consumers' experience. We present our initial results on predicting real-life experience on laptops from their hardware specifications. We target web applications that run on Chromebooks (ChromeOS laptops) for a simple and fair aggregation of experience across applications and workloads. On 54 laptops, we track 9 UX metrics on common end-user workloads: web browsing, video playback and audio/video calls. We focus on a subset of high-level metrics exposed by the Chrome browser, that are part of the Web Vitals initiative for judging the UX on web applications. With a dataset of 100K UX data points, we train gradient boosted regression trees that predict the metric values from device specifications. Across our 9 metrics, we note a mean $R^2$ score (goodness-of-fit on our dataset) of 97.8% and a mean MAAPE (percentage error in prediction on unseen data) of 10.1%.
翻译:制造商常面临一个普遍挑战:如何评估设备的整体用户体验(UX)。当前,设备厂商主要依赖Geekbench等微基准测试分数,这些测试虽能对CPU、内存等特定硬件组件进行压力测试,却未能充分捕捉消费者的实际工作负载。系统设计人员通常借助领域特定启发式方法及大量原型测试来实现期望的UX目标,然而制造商宣称的性能与消费者实际体验间常存在落差。我们提出初步研究成果:通过笔记本电脑的硬件规格预测真实使用体验。研究聚焦Chromebook(搭载ChromeOS的笔记本电脑)上的网页应用,以实现跨应用与工作负载的简单公平体验聚合。在54台笔记本电脑上,我们针对常见的终端用户工作负载(网页浏览、视频播放、音频/视频通话)追踪了9项UX指标,重点关注Chrome浏览器暴露的、属于Web Vitals倡议的高阶指标子集。基于10万条UX数据点构建数据集后,我们训练梯度提升回归树模型,从设备规格预测指标值。在全部9项指标中,模型平均决定系数$R^2$(数据集拟合优度)达97.8%,平均MAAPE(未观测数据预测百分比误差)为10.1%。