Transformers have become pivotal in Natural Language Processing, demonstrating remarkable success in applications like Machine Translation and Summarization. Given their widespread adoption, several works have attempted to analyze the expressivity of Transformers. Expressivity of a neural network is the class of functions it can approximate. A neural network is fully expressive if it can act as a universal function approximator. We attempt to analyze the same for Transformers. Contrary to existing claims, our findings reveal that Transformers struggle to reliably approximate continuous functions, relying on piecewise constant approximations with sizable intervals. The central question emerges as: "\textit{Are Transformers truly Universal Function Approximators}?" To address this, we conduct a thorough investigation, providing theoretical insights and supporting evidence through experiments. Our contributions include a theoretical analysis pinpointing the root of Transformers' limitation in function approximation and extensive experiments to verify the limitation. By shedding light on these challenges, we advocate a refined understanding of Transformers' capabilities.
翻译:Transformer已成为自然语言处理领域的核心技术,在机器翻译和文本摘要等应用中展现出卓越性能。鉴于其广泛采用,已有研究尝试分析Transformer的表达能力。神经网络的表达能力是指其可逼近的函数类别,若某网络能作为通用函数逼近器,则称其具备完全表达能力。我们试图对Transformer进行相同分析。与现有结论相反,我们的研究发现Transformer难以可靠地逼近连续函数,仅能通过分段常数近似实现,且逼近区间间隔显著。核心问题浮现:"Transformer是否真正具备通用函数逼近能力?"为此,我们开展系统性研究,通过理论推导与实验验证提供支撑。本文贡献包括:理论剖析Transformer函数逼近局限性的根源,并通过大量实验验证该局限性。通过揭示这些挑战,我们倡导对Transformer能力形成更精准的认知。