Modern processors have instructions to process 16 bytes or more at once. These instructions are called SIMD, for single instruction, multiple data. Recent advances have leveraged SIMD instructions to accelerate parsing of common Internet formats such as JSON and base64. During HTML parsing, they quickly identify specific characters with a strategy called vectorized classification. We review their techniques and compare them with a faster alternative. We measure a 20-fold performance improvement in HTML scanning compared to traditional methods on recent ARM processors. Our findings highlight the potential of SIMD-based algorithms for optimizing Web browser performance.
翻译:现代处理器具备单次处理16字节或更多数据的指令,这类指令称为单指令多数据(SIMD)。近期研究利用SIMD指令加速了JSON、base64等常见网络格式的解析。在HTML解析过程中,研究者通过名为向量化分类的策略快速识别特定字符。本文系统评述了现有技术,并与一种更快的替代方案进行比较。实验表明,在最新ARM处理器上,我们的HTML扫描方法相比传统方案实现了20倍的性能提升。这些发现凸显了基于SIMD的算法在优化Web浏览器性能方面的潜力。