Transformers have emerged as a widely used neural network model for various natural language processing tasks. Previous research explored their relationship with constant-depth threshold circuits, making two assumptions: average-hard attention and logarithmic precision for internal computations relative to input length. Merrill et al. (2022) prove that average-hard attention transformers recognize languages that fall within the complexity class TC0, denoting the set of languages that can be recognized by constant-depth polynomial-size threshold circuits. Likewise, Merrill and Sabharwal (2023) show that log-precision transformers recognize languages within the class of uniform TC0. This shows that both transformer models can be simulated by constant-depth threshold circuits, with the latter being more robust due to generating a uniform circuit family. Our paper shows that the first result can be extended to yield uniform circuits as well.
翻译:Transformer已成为多种自然语言处理任务中广泛使用的神经网络模型。先前研究探讨了它们与恒定深度阈值电路的关系,提出了两个假设:平均硬注意力及内部计算相对于输入长度的对数精度。Merrill等人(2022)证明,平均硬注意力Transformer识别的语言属于复杂度类TC0(即恒定深度多项式规模阈值电路可识别的语言集合)。同样,Merrill和Sabharwal(2023)表明,对数精度Transformer识别的语言属于均匀TC0类。这表明两种Transformer模型均可被恒定深度阈值电路模拟,后者因能生成均匀电路族而更具鲁棒性。本文证明,第一个结果可被扩展以同样生成均匀电路。