We analyze how well pre-trained large language models (e.g., Llama2, GPT-4, Claude 3, etc) can do linear and non-linear regression when given in-context examples, without any additional training or gradient updates. Our findings reveal that several large language models (e.g., GPT-4, Claude 3) are able to perform regression tasks with a performance rivaling (or even outperforming) that of traditional supervised methods such as Random Forest, Bagging, or Gradient Boosting. For example, on the challenging Friedman #2 regression dataset, Claude 3 outperforms many supervised methods such as AdaBoost, SVM, Random Forest, KNN, or Gradient Boosting. We then investigate how well the performance of large language models scales with the number of in-context exemplars. We borrow from the notion of regret from online learning and empirically show that LLMs are capable of obtaining a sub-linear regret.
翻译:我们分析了预训练大型语言模型(如Llama2、GPT-4、Claude 3等)在给定上下文示例时,无需额外训练或梯度更新即可执行线性和非线性回归的能力。研究结果表明,多个大型语言模型(例如GPT-4、Claude 3)能够执行回归任务,其性能可与甚至超越随机森林、Bagging或梯度提升等传统监督方法。例如,在具有挑战性的Friedman #2回归数据集上,Claude 3的表现优于AdaBoost、支持向量机、随机森林、K近邻或梯度提升等多种监督方法。我们进一步探究了大型语言模型的性能如何随上下文示例数量扩展。借鉴在线学习中的遗憾概念,实验证实大型语言模型能够实现次线性遗憾值。