The Performance of the LSTM-based Code Generated by Large Language Models (LLMs) in Forecasting Time Series Data

As an intriguing case is the goodness of the machine and deep learning models generated by these LLMs in conducting automated scientific data analysis, where a data analyst may not have enough expertise in manually coding and optimizing complex deep learning models and codes and thus may opt to leverage LLMs to generate the required models. This paper investigates and compares the performance of the mainstream LLMs, such as ChatGPT, PaLM, LLama, and Falcon, in generating deep learning models for analyzing time series data, an important and popular data type with its prevalent applications in many application domains including financial and stock market. This research conducts a set of controlled experiments where the prompts for generating deep learning-based models are controlled with respect to sensitivity levels of four criteria including 1) Clarify and Specificity, 2) Objective and Intent, 3) Contextual Information, and 4) Format and Style. While the results are relatively mix, we observe some distinct patterns. We notice that using LLMs, we are able to generate deep learning-based models with executable codes for each dataset seperatly whose performance are comparable with the manually crafted and optimized LSTM models for predicting the whole time series dataset. We also noticed that ChatGPT outperforms the other LLMs in generating more accurate models. Furthermore, we observed that the goodness of the generated models vary with respect to the ``temperature'' parameter used in configuring LLMS. The results can be beneficial for data analysts and practitioners who would like to leverage generative AIs to produce good prediction models with acceptable goodness.

翻译：作为一个引人关注的案例，由这些大型语言模型（LLMs）生成的机器学习和深度学习模型在自动化科学数据分析中的表现值得探讨。数据分析师可能缺乏手动编码和优化复杂深度学习模型与代码的足够专业知识，因此可能选择利用LLMs来生成所需模型。本文研究并比较了主流LLMs（如ChatGPT、PaLM、LLama和Falcon）在生成用于分析时间序列数据的深度学习模型方面的性能。时间序列数据是一种重要且流行的数据类型，在金融和股票市场等众多应用领域中具有广泛应用。本研究进行了一系列对照实验，其中生成基于深度学习的模型的提示在以下四个标准的敏感度水平上受控：1）清晰度与特异性，2）目标与意图，3）上下文信息，以及4）格式与风格。虽然结果相对混杂，但我们观察到了一些明显的模式。我们注意到，使用LLMs能够为每个数据集分别生成具有可执行代码的基于深度学习的模型，其性能与手工构建和优化的LSTM模型在预测整个时间序列数据集方面相当。我们还发现，ChatGPT在生成更准确模型方面优于其他LLMs。此外，我们观察到生成模型的质量随LLMs配置中使用的“温度”参数而变化。这些结果对于希望利用生成式人工智能来产生具有可接受质量的良好预测模型的数据分析师和实践者可能有益。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

专知会员服务

66+阅读 · 2023年6月10日

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日