Transformers have emerged as a widely used neural network model for various natural language processing tasks. Previous research explored their relationship with constant-depth threshold circuits, making two assumptions: average-hard attention and logarithmic precision for internal computations relative to input length. Merrill et al. (2022) prove that average-hard attention transformers recognize languages that fall within the complexity class TC0, denoting the set of languages that can be recognized by constant-depth polynomial-size threshold circuits. Likewise, Merrill and Sabharwal (2023) show that log-precision transformers recognize languages within the class of uniform TC0. This shows that both transformer models can be simulated by constant-depth threshold circuits, with the latter being more robust due to generating a uniform circuit family. Our paper shows that the first result can be extended to yield uniform circuits as well.
翻译:变压器已成为一种广泛用于各种自然语言处理任务的神经网络模型。先前的研究探讨了它们与常数深度阈值电路的关系,并作出两个假设:平均硬注意力以及内部计算相对于输入长度的对数精度。Merrill等人(2022)证明,平均硬注意力变压器能够识别属于复杂度类TC0的语言,即那些可由常数深度、多项式大小的阈值电路识别的语言集合。同样,Merrill和Sabharwal(2023)表明,对数精度变压器能够识别属于均匀TC0类中的语言。这表明这两种变压器模型都可被常数深度阈值电路模拟,而后者因生成均匀电路族而更具鲁棒性。我们的论文表明,第一个结果可以扩展,同样可生成均匀电路。