Large language models (LLMs) have demonstrated remarkable potential across a broad range of applications. However, producing reliable text that faithfully represents data remains a challenge. While prior work has shown that task-specific conditioning through in-context learning and knowledge augmentation can improve performance, LLMs continue to struggle with interpreting and reasoning about numerical data. To address this, we introduce wordalisations, a methodology for generating stylistically natural narratives from data. Much like how visualisations display numerical data in a way that is easy to digest, wordalisations abstract data insights into descriptive texts. To illustrate the method's versatility, we apply it to three application areas: scouting football players, personality tests, and international survey data. Due to the absence of standardized benchmarks for this specific task, we conduct LLM-as-a-judge and human-as-a-judge evaluations to assess accuracy across the three applications. We found that wordalisation produces engaging texts that accurately represent the data. We further describe best practice methods for open and transparent development of communication about data.
翻译:大型语言模型(LLM)已在广泛的应用领域中展现出卓越潜力。然而,生成能够忠实反映数据的可靠文本仍面临挑战。尽管先前研究表明,通过上下文学习与知识增强进行任务特定条件调节可提升模型性能,但LLM在数值数据的解释与推理方面仍存在困难。为此,我们提出"词化表示"方法——一种从数据生成风格自然叙述文本的方法论。正如可视化技术将数值数据以易于理解的方式呈现,词化表示将数据洞察抽象为描述性文本。为展示该方法的普适性,我们将其应用于三个领域:足球运动员球探评估、人格测试与国际调查数据分析。由于该特定任务缺乏标准化评估基准,我们采用LLM作为评估者与人类作为评估者的双重评估机制,对三个应用场景的准确性进行检验。研究发现,词化表示方法能够生成准确反映数据且具有吸引力的文本。我们进一步阐述了关于数据沟通的开放透明化开发最佳实践方法。