Understanding the inner workings of machine learning models like Transformers is vital for their safe and ethical use. This paper provides a comprehensive analysis of a one-layer Transformer model trained to perform n-digit integer addition. Our findings suggest that the model dissects the task into parallel streams dedicated to individual digits, employing varied algorithms tailored to different positions within the digits. Furthermore, we identify a rare scenario characterized by high loss, which we explain. By thoroughly elucidating the model's algorithm, we provide new insights into its functioning. These findings are validated through rigorous testing and mathematical modeling, thereby contributing to the broader fields of model understanding and interpretability. Our approach opens the door for analyzing more complex tasks and multi-layer Transformer models.
翻译:理解Transformer等机器学习模型的内部工作机制对于其安全与合乎伦理的使用至关重要。本文对经过训练以执行n位整数加法的一层Transformer模型进行了全面分析。研究结果表明,该模型将任务分解为针对各位数的并行处理流,并采用针对不同数字位置量身定制的多种算法。此外,我们识别出一种罕见的高损失场景并予以解释。通过系统阐明模型的算法,我们为其运作机制提供了新见解。这些发现经过严格测试与数学建模验证,从而为模型理解与可解释性这一更广泛领域做出贡献。我们的方法为分析更复杂的任务及多层Transformer模型开辟了道路。