One of the most recent and fascinating breakthroughs in artificial intelligence is ChatGPT, a chatbot which can simulate human conversation. ChatGPT is an instance of GPT4, which is a language model based on generative gredictive gransformers. So if one wants to study from a theoretical point of view, how powerful such artificial intelligence can be, one approach is to consider transformer networks and to study which problems one can solve with these networks theoretically. Here it is not only important what kind of models these network can approximate, or how they can generalize their knowledge learned by choosing the best possible approximation to a concrete data set, but also how well optimization of such transformer network based on concrete data set works. In this article we consider all these three different aspects simultaneously and show a theoretical upper bound on the missclassification probability of a transformer network fitted to the observed data. For simplicity we focus in this context on transformer encoder networks which can be applied to define an estimate in the context of a classification problem involving natural language.
翻译:人工智能领域最新且引人入胜的突破之一是ChatGPT,这是一种能够模拟人类对话的聊天机器人。ChatGPT是GPT4的一个实例,而GPT4是基于生成式预训练变换器的语言模型。因此,若要从理论角度研究此类人工智能的强大程度,一种方法便是考察Transformer网络,并从理论上探究这些网络能够解决哪些问题。在此过程中,不仅需要关注这类网络能够逼近何种模型,或如何通过选择针对特定数据集的最佳逼近来泛化所学知识,还需关注基于具体数据集优化此类Transformer网络的效果。本文同时考虑这三方面问题,并给出了拟合观测数据的Transformer网络误分类概率的理论上限。为简化讨论,我们聚焦于可用于定义自然语言分类问题估计量的Transformer编码器网络。