In this paper, we introduce EconLogicQA, a rigorous benchmark designed to assess the sequential reasoning capabilities of large language models (LLMs) within the intricate realms of economics, business, and supply chain management. Diverging from traditional benchmarks that predict subsequent events individually, EconLogicQA poses a more challenging task: it requires models to discern and sequence multiple interconnected events, capturing the complexity of economic logics. EconLogicQA comprises an array of multi-event scenarios derived from economic articles, which necessitate an insightful understanding of both temporal and logical event relationships. Through comprehensive evaluations, we exhibit that EconLogicQA effectively gauges a LLM's proficiency in navigating the sequential complexities inherent in economic contexts. We provide a detailed description of EconLogicQA dataset and shows the outcomes from evaluating the benchmark across various leading-edge LLMs, thereby offering a thorough perspective on their sequential reasoning potential in economic contexts. Our benchmark dataset is available at https://huggingface.co/datasets/yinzhu-quan/econ_logic_qa.
翻译:本文提出EconLogicQA,一个旨在评估大语言模型在经济学、商业与供应链管理复杂领域中序列推理能力的严格基准。与逐个预测后续事件的传统基准不同,EconLogicQA提出了更具挑战性的任务:要求模型识别并排序多个相互关联的事件,捕捉经济逻辑的复杂性。EconLogicQA包含从经济文献中提取的多事件场景,需要深入理解事件的时间与逻辑关系。通过全面评估,我们证明EconLogicQA能有效衡量大语言模型在经济情境中处理序列复杂性的能力。我们详细描述了EconLogicQA数据集,并展示了在多个前沿大语言模型上的评估结果,从而全面揭示其在经济情境下的序列推理潜力。本基准数据集可在https://huggingface.co/datasets/yinzhu-quan/econ_logic_qa获取。