One of the most recent and fascinating breakthroughs in artificial intelligence is ChatGPT, a chatbot which can simulate human conversation. ChatGPT is an instance of GPT4, which is a language model based on generative gredictive gransformers. So if one wants to study from a theoretical point of view, how powerful such artificial intelligence can be, one approach is to consider transformer networks and to study which problems one can solve with these networks theoretically. Here it is not only important what kind of models these network can approximate, or how they can generalize their knowledge learned by choosing the best possible approximation to a concrete data set, but also how well optimization of such transformer network based on concrete data set works. In this article we consider all these three different aspects simultaneously and show a theoretical upper bound on the missclassification probability of a transformer network fitted to the observed data. For simplicity we focus in this context on transformer encoder networks which can be applied to define an estimate in the context of a classification problem involving natural language.
翻译:人工智能领域最新且最引人入胜的突破之一是ChatGPT,这是一种能够模拟人类对话的聊天机器人。ChatGPT是GPT4的一个实例,而GPT4是一种基于生成式预测Transformer的语言模型。因此,若要从理论角度研究此类人工智能的能力极限,一种途径是考察Transformer网络,并从理论上探究这些网络能够解决哪些问题。这不仅涉及此类网络能够逼近何种模型,或它们如何通过选择对具体数据集的最佳可能逼近来泛化所学知识,还涉及基于具体数据集的此类Transformer网络优化效果如何。本文同时考量这三个不同方面,并给出了拟合观测数据的Transformer网络误分类概率的理论上界。为简化起见,我们在此背景下聚焦于可应用于自然语言分类问题中构建估计量的Transformer编码器网络。