Can we establish provable performance guarantees for transformers? Establishing such theoretical guarantees is a milestone in developing trustworthy generative AI. In this paper, we take a step toward addressing this question by focusing on optimal transport, a fundamental problem at the intersection of combinatorial and continuous optimization. Leveraging the computational power of attention layers, we prove that a transformer with fixed parameters can effectively solve the optimal transport problem in Wasserstein-2 with entropic regularization for an arbitrary number of points. Consequently, the transformer can sort lists of arbitrary sizes up to an approximation factor. Our results rely on an engineered prompt that enables the transformer to implement gradient descent with adaptive stepsizes on the dual optimal transport. Combining the convergence analysis of gradient descent with Sinkhorn dynamics, we establish an explicit approximation bound for optimal transport with transformers, which improves as depth increases. Our findings provide novel insights into the essence of prompt engineering and depth for solving optimal transport. In particular, prompt engineering boosts the algorithmic expressivity of transformers, allowing them implement an optimization method. With increasing depth, transformers can simulate several iterations of gradient descent.
翻译:我们能否为Transformer建立可证明的性能保证?建立此类理论保证是发展可信生成式人工智能的重要里程碑。本文通过聚焦最优传输这一组合优化与连续优化交叉领域的基本问题,向回答该问题迈进一步。利用注意力层的计算能力,我们证明具有固定参数的Transformer能够有效求解任意数量点下带熵正则化的Wasserstein-2距离最优传输问题。由此,该Transformer能以近似因子对任意长度的列表进行排序。我们的结果依赖于一种精心设计的提示,该提示使Transformer能够在最优传输对偶问题上实现自适应步长的梯度下降。通过将梯度下降的收敛性分析与Sinkhorn动力学相结合,我们为基于Transformer的最优传输建立了显式近似界,该界限随深度增加而改进。我们的发现为理解解决最优传输问题时提示工程与深度的本质提供了新视角。具体而言,提示工程增强了Transformer的算法表达能力,使其能够实现优化方法。随着深度增加,Transformer可以模拟梯度下降的多次迭代。