Exploring the Impact of In-Browser Deep Learning Inference on Quality of User Experience and Performance

Deep Learning (DL) is increasingly being integrated into Web applications through a method known as "in-browser inference", where the DL processes occur directly within Web browsers. However, the actual performance of this method and its effect on user experience quality (QoE) is not well-understood. This gap in knowledge necessitates new forms of QoE measurement, going beyond traditional metrics such as page load time. To address this, we conducted the first extensive performance evaluation of in-browser inference. We introduced new metrics for this purpose: responsiveness, smoothness, and inference accuracy. Our thorough study included 9 widely-used DL models and tested them across 50 popular PC Web browsers. The findings show a significant latency issue with in-browser inference: it's on average 16.9 times slower on CPU and 4.9 times slower on GPU than native inference methods. Several factors contribute to this latency, including underused hardware instruction sets, inherent delays in the runtime environment, resource competition within the browser, and inefficiencies in software libraries and GPU abstractions. Moreover, in-browser inference demands a lot of memory, sometimes up to 334.6 times more than the size of the DL models themselves. This excessive memory usage is partly due to suboptimal memory management. Additionally, we noticed that in-browser inference increases the time it takes for graphical user interface (GUI) components to load in web browsers by a significant 67.2\%, which severely impacts the overall QoE for users of web applications that depend on this technology.

翻译：深度学习（Deep Learning, DL）正通过一种称为“浏览器端推理”（in-browser inference）的方法日益集成到Web应用中，该方法使得深度学习处理过程直接在Web浏览器内完成。然而，该方法实际性能及其对用户体验质量（QoE）的影响尚未得到充分理解。这一认知空白要求我们超越页面加载时间等传统指标，探索新型QoE测量方式。为此，我们首次对浏览器端推理进行了大规模性能评估，并针对该目标引入了新指标：响应性、流畅性及推理准确性。我们全面研究了9个广泛使用的深度学习模型，并在50款主流PC浏览器上进行了测试。结果表明，浏览器端推理存在显著的延迟问题：其CPU推理速度平均比原生推理方法慢16.9倍，GPU推理速度平均慢4.9倍。造成此延迟的多种因素包括：硬件指令集利用不足、运行时环境固有延迟、浏览器内资源竞争，以及软件库与GPU抽象层的低效。此外，浏览器端推理对内存需求极大，有时其内存占用可达深度学习模型自身大小的334.6倍。这种过度内存占用部分归因于次优的内存管理。同时，我们观察到浏览器端推理会显著延长Web浏览器中图形用户界面（GUI）组件的加载时间，增加幅度高达67.2%，这严重影响了依赖该技术的Web应用用户的整体QoE。