Anatomizing Deep Learning Inference in Web Browsers

Web applications have increasingly adopted Deep Learning (DL) through in-browser inference, wherein DL inference performs directly within Web browsers. The actual performance of in-browser inference and its impacts on the quality of experience (QoE) remain unexplored, and urgently require new QoE measurements beyond traditional ones, e.g., mainly focusing on page load time. To bridge this gap, we make the first comprehensive performance measurement of in-browser inference to date. Our approach proposes new metrics to measure in-browser inference: responsiveness, smoothness, and inference accuracy. Our extensive analysis involves 9 representative DL models across Web browsers of 50 popular PC devices and 20 mobile devices. The results reveal that in-browser inference exhibits a substantial latency gap, averaging 16.9 times slower on CPU and 4.9 times slower on GPU compared to native inference on PC devices. The gap on mobile CPU and mobile GPU is 15.8 times and 7.8 times, respectively. Furthermore, we identify contributing factors to such latency gap, including underutilized hardware instruction sets, inherent overhead in the runtime environment, resource contention within the browser, and inefficiencies in software libraries and GPU abstractions. Additionally, in-browser inference imposes significant memory demands, at times exceeding 334.6 times the size of the DL models themselves, partly attributable to suboptimal memory management. We also observe that in-browser inference leads to a significant 67.2% increase in the time it takes for GUI components to render within Web browsers, significantly affecting the overall user QoE of Web applications reliant on this technology

翻译：Web应用正日益通过浏览器内推理方式采用深度学习技术，即DL推理直接在Web浏览器内执行。浏览器内推理的实际性能及其对体验质量的影响尚未得到充分探索，迫切需要超越传统指标（如主要关注页面加载时间）的新型QoE度量方法。为填补这一空白，我们开展了迄今为止首次全面的浏览器内推理性能测量研究。本方法提出衡量浏览器内推理的新指标：响应性、流畅性和推理准确度。我们通过50台主流PC设备和20台移动设备的Web浏览器，对9个代表性DL模型进行了广泛分析。结果显示：浏览器内推理存在显著延迟差距，在PC设备上CPU推理平均比本地推理慢16.9倍，GPU推理慢4.9倍；移动设备CPU和GPU的延迟差距分别为15.8倍和7.8倍。进一步研究发现导致这种延迟差距的影响因素包括：硬件指令集未充分利用、运行时环境固有开销、浏览器内部资源争用，以及软件库和GPU抽象层的效率问题。此外，浏览器内推理会产生显著内存需求，有时超过DL模型本身大小的334.6倍，这部分归因于内存管理策略欠佳。我们还观察到浏览器内推理导致GUI组件渲染时间显著增加67.2%，这对依赖该技术的Web应用整体用户体验质量产生了重要影响。