Recent works have proposed various explanations for the ability of modern large language models (LLMs) to perform in-context prediction. We propose an alternative conceptual viewpoint from an information-geometric and statistical perspective. Motivated by Bach[2023], we model training as learning an embedding of probability distributions into the space of quantum density operators, and in-context learning as maximum-likelihood prediction over a specified class of quantum models. We provide an interpretation of this predictor in terms of quantum reverse information projection and quantum Pythagorean theorem when the class of quantum models is sufficiently expressive. We further derive non-asymptotic performance guarantees in terms of convergence rates and concentration inequalities, both in trace norm and quantum relative entropy. Our approach provides a unified framework to handle both classical and quantum LLMs.
翻译:近期研究针对现代大型语言模型(LLM)的上下文预测能力提出了多种解释。本文从信息几何与统计的视角提出一种替代性概念框架。受Bach[2023]启发,我们将训练过程建模为将概率分布嵌入量子密度算子空间的学习过程,而将上下文学习视为在特定量子模型类别上的最大似然预测。当量子模型类具有充分表达能力时,我们通过量子反向信息投影与量子勾股定理对此预测器进行理论阐释。进一步地,我们分别在迹范数与量子相对熵的度量下,推导出关于收敛速率与集中不等式的非渐近性能保证。本方法为处理经典与量子LLM提供了统一的理论框架。