Transformers have emerged as the dominant neural-network architecture, achieving state-of-the-art performance in language processing and computer vision. At the core of these models lies the attention mechanism, which requires a nonlinear, non-negative mapping using the Softmax function. However, although Softmax operations account for less than 1% of the total operation count, they can disproportionately bottleneck overall inference latency. Here, we use thin-film lithium niobate (TFLN) Mach-Zehnder modulators (MZMs) as analog nonlinear computational elements to drastically reduce the latency of nonlinear computations. We implement electro-optic alternatives to digital Softmax and Sigmoid, and evaluate their performance in Vision Transformers and Large Language Models. Our system maintains highly competitive accuracy, even under aggressive 4-bit input-output quantization of the analog units. We further characterize system noise at encoding speeds up to 10 GBaud and assess model robustness under various noise conditions. Our findings suggest that TFLN modulators can serve as nonlinear function units within hybrid co-packaged hardware, enabling high-speed and energy-efficient nonlinear computation.
翻译:变压器已成为主导的神经网络架构,在语言处理和计算机视觉中实现了最先进的性能。这些模型的核心是注意力机制,它需要利用Softmax函数实现非线性的非负映射。然而,尽管Softmax操作占总操作数的比例不到1%,但它们可能会不成比例地成为整体推理延迟的瓶颈。在此,我们采用薄膜铌酸锂(TFLN)马赫-曾德尔调制器(MZM)作为模拟非线性计算元件,大幅降低非线性计算的延迟。我们实现了数字Softmax和Sigmoid的电光替代方案,并评估了它们在视觉变压器和大语言模型中的性能。我们的系统即使在模拟单元采用激进的4位输入-输出量化下,也能保持极具竞争力的准确率。我们进一步表征了在高达10 GBaud编码速度下的系统噪声,并评估了各种噪声条件下模型的鲁棒性。我们的研究结果表明,TFLN调制器可作为混合共封装硬件中的非线性功能单元,实现高速且节能的非线性计算。