We present the setup and the tasks of the FinMMEval Lab at CLEF 2026, which introduces the first multilingual and multimodal evaluation framework for financial Large Language Models (LLMs). While recent advances in financial natural language processing have enabled automated analysis of market reports, regulatory documents, and investor communications, existing benchmarks remain largely monolingual, text-only, and limited to narrow subtasks. FinMMEval 2026 addresses this gap by offering three interconnected tasks that span financial understanding, reasoning, and decision-making: Financial Exam Question Answering, Multilingual Financial Question Answering (PolyFiQA), and Financial Decision Making. Together, these tasks provide a comprehensive evaluation suite that measures models' ability to reason, generalize, and act across diverse languages and modalities. The lab aims to promote the development of robust, transparent, and globally inclusive financial AI systems, with datasets and evaluation resources publicly released to support reproducible research.
翻译:本文介绍了CLEF 2026会议中FinMMEval实验室的设立背景与任务设计,该实验室首次提出了面向金融大语言模型(LLMs)的多语言多模态评估框架。尽管金融自然语言处理领域的最新进展已能实现对市场报告、监管文件和投资者沟通材料的自动化分析,但现有基准测试仍主要局限于单一语言、纯文本形式及狭窄的子任务范畴。FinMMEval 2026通过构建三个相互关联的任务来填补这一空白,涵盖金融理解、推理与决策全过程:金融考试问答、多语言金融问答(PolyFiQA)以及金融决策制定。这些任务共同构成了一套综合评估体系,用于衡量模型在不同语言和模态下的推理、泛化与执行能力。本实验室旨在推动构建鲁棒、透明且具有全球包容性的金融人工智能系统,相关数据集与评估资源将公开发布以支持可复现的研究。