The transformer architecture has demonstrated remarkable capabilities in modern artificial intelligence, among which the capability of implicitly learning an internal model during inference time is widely believed to play a key role in the under standing of pre-trained large language models. However, most recent works have been focusing on studying supervised learning topics such as in-context learning, leaving the field of unsupervised learning largely unexplored. This paper investigates the capabilities of transformers in solving Gaussian Mixture Models (GMMs), a fundamental unsupervised learning problem through the lens of statistical estimation. We propose a transformer-based learning framework called TGMM that simultaneously learns to solve multiple GMM tasks using a shared transformer backbone. The learned models are empirically demonstrated to effectively mitigate the limitations of classical methods such as Expectation-Maximization (EM) or spectral algorithms, at the same time exhibit reasonable robustness to distribution shifts. Theoretically, we prove that transformers can approximate both the EM algorithm and a core component of spectral methods (cubic tensor power iterations). These results bridge the gap between practical success and theoretical understanding, positioning transformers as versatile tools for unsupervised learning.
翻译:Transformer架构在现代人工智能中展现出卓越能力,其推理过程中隐式学习内部模型的能力被广泛认为是理解预训练大语言模型的关键因素。然而,近期研究多集中于监督学习主题(如上下文学习),无监督学习领域在很大程度上尚未得到充分探索。本文通过统计估计的视角,研究Transformer在解决高斯混合模型这一基础无监督学习问题中的能力。我们提出了一种基于Transformer的学习框架TGMM,该框架使用共享Transformer主干网络同时学习解决多个GMM任务。实验表明,所学模型能有效缓解经典方法(如期望最大化算法或谱算法)的局限性,同时对分布偏移表现出合理的鲁棒性。理论上,我们证明了Transformer能够同时逼近EM算法和谱方法的核心组件(三次张量幂迭代)。这些结果弥合了实践成功与理论理解之间的鸿沟,将Transformer定位为无监督学习的通用工具。