We analyze how well pre-trained large language models (e.g., Llama2, GPT-4, Claude 3, etc) can do linear and non-linear regression when given in-context examples, without any additional training or gradient updates. Our findings reveal that several large language models (e.g., GPT-4, Claude 3) are able to perform regression tasks with a performance rivaling (or even outperforming) that of traditional supervised methods such as Random Forest, Bagging, or Gradient Boosting. For example, on the challenging Friedman #2 regression dataset, Claude 3 outperforms many supervised methods such as AdaBoost, SVM, Random Forest, KNN, or Gradient Boosting. We then investigate how well the performance of large language models scales with the number of in-context exemplars. We borrow from the notion of regret from online learning and empirically show that LLMs are capable of obtaining a sub-linear regret.
翻译:我们分析了预训练大语言模型(如Llama2、GPT-4、Claude 3等)在仅提供上下文示例且无需额外训练或梯度更新的情况下,执行线性和非线性回归任务的能力。研究结果表明,多个大语言模型(如GPT-4、Claude 3)能够以媲美甚至超越传统监督方法(如随机森林、Bagging或梯度提升)的性能完成回归任务。例如,在具有挑战性的Friedman #2回归数据集上,Claude 3的表现优于AdaBoost、支持向量机、随机森林、K近邻及梯度提升等多种监督方法。我们进一步探究了大语言模型性能随上下文示例数量扩展的规律,借鉴在线学习中的遗憾概念,通过实证分析表明大语言模型能够实现次线性遗憾。