Transformers have become pivotal in Natural Language Processing, demonstrating remarkable success in applications like Machine Translation and Summarization. Given their widespread adoption, several works have attempted to analyze the expressivity of Transformers. Expressivity of a neural network is the class of functions it can approximate. A neural network is fully expressive if it can act as a universal function approximator. We attempt to analyze the same for Transformers. Contrary to existing claims, our findings reveal that Transformers struggle to reliably approximate smooth functions, relying on piecewise constant approximations with sizable intervals. The central question emerges as: "Are Transformers truly Universal Function Approximators?" To address this, we conduct a thorough investigation, providing theoretical insights and supporting evidence through experiments. Theoretically, we prove that Transformer Encoders cannot approximate smooth functions. Experimentally, we complement our theory and show that the full Transformer architecture cannot approximate smooth functions. By shedding light on these challenges, we advocate a refined understanding of Transformers' capabilities.
翻译:Transformer 已成为自然语言处理领域的核心模型,在机器翻译和文本摘要等应用中取得了显著成功。鉴于其广泛应用,已有若干研究尝试分析 Transformer 的表达能力。神经网络的表达能力指其能够逼近的函数类别。若神经网络可作为通用函数逼近器,则称其具备完全表达能力。本文尝试对 Transformer 进行类似分析。与现有结论相反,我们的研究结果表明,Transformer 难以可靠地逼近光滑函数,其依赖的是具有较大区间的分段常数逼近。核心问题由此浮现:“Transformer 是否真的是通用函数逼近器?”为探究此问题,我们开展了系统研究,通过理论分析与实验证据提供支持。理论上,我们证明了 Transformer 编码器无法逼近光滑函数。实验方面,我们通过完整 Transformer 架构的实验进一步验证了理论,表明其同样无法逼近光滑函数。通过揭示这些挑战,我们提倡对 Transformer 的能力边界建立更精确的认知。