Large language models (LLMs) have been applied to a wide range of data-to-text generation tasks, including tables, graphs, and time-series numerical data-to-text settings. While research on generating prompts for structured data such as tables and graphs is gaining momentum, in-depth investigations into prompting for time-series numerical data are lacking. Therefore, this study explores various input representations, including sequences of tokens and structured formats such as HTML, LaTeX, and Python-style codes. In our experiments, we focus on the task of Market Comment Generation, which involves taking a numerical sequence of stock prices as input and generating a corresponding market comment. Contrary to our expectations, the results show that prompts resembling programming languages yield better outcomes, whereas those similar to natural languages and longer formats, such as HTML and LaTeX, are less effective. Our findings offer insights into creating effective prompts for tasks that generate text from numerical sequences.
翻译:大型语言模型(LLMs)已被广泛应用于多种数据到文本的生成任务,包括表格、图表以及时间序列数值数据到文本的场景。尽管针对表格和图表等结构化数据的提示生成研究正在兴起,但针对时间序列数值数据的提示深入探索仍然匮乏。因此,本研究探讨了多种输入表示方法,包括标记序列以及HTML、LaTeX和Python风格代码等结构化格式。在实验中,我们聚焦于市场评论生成任务,该任务以股票价格数值序列为输入,生成相应的市场评论。与预期相反,结果显示,类似编程语言的提示表现更优,而类似自然语言及较长格式(如HTML和LaTeX)的提示效果较差。我们的发现为如何构建从数值序列生成文本任务中的有效提示提供了见解。