Japanese finance combines agglutinative, head-final linguistic structure, mixed writing systems, and high-context communication norms that rely on indirect expression and implicit commitment, posing a substantial challenge for LLMs. We introduce Ebisu, a benchmark for native Japanese financial language understanding, comprising two linguistically and culturally grounded, expert-annotated tasks: JF-ICR, which evaluates implicit commitment and refusal recognition in investor-facing Q&A, and JF-TE, which assesses hierarchical extraction and ranking of nested financial terminology from professional disclosures. We evaluate a diverse set of open-source and proprietary LLMs spanning general-purpose, Japanese-adapted, and financial models. Results show that even state-of-the-art systems struggle on both tasks. While increased model scale yields limited improvements, language- and domain-specific adaptation does not reliably improve performance, leaving substantial gaps unresolved. Ebisu provides a focused benchmark for advancing linguistically and culturally grounded financial NLP. All datasets and evaluation scripts are publicly released.
翻译:日语金融领域融合了黏着语、中心词后置的语言结构、混合书写系统以及依赖间接表达与隐含承诺的高语境交流规范,这对大型语言模型构成了重大挑战。我们推出Ebisu——一个针对本土日语金融语言理解的基准测试,包含两项基于语言学与文化背景、由专家标注的任务:JF-ICR(评估面向投资者的问答中隐含承诺与拒绝的识别能力)与JF-TE(评估从专业披露文件中嵌套金融术语的层级抽取与排序能力)。我们对涵盖通用型、日语适配型及金融专用型在内的多样化开源与专有大型语言模型进行了评估。结果显示,即使最先进的系统在这两项任务上均表现不佳。虽然增大模型规模仅带来有限改进,但语言与领域特异性适配并未能稳定提升性能,仍存在显著差距。Ebisu为推进基于语言学与文化背景的金融自然语言处理提供了聚焦型基准。所有数据集与评估脚本均已公开发布。