The SuperCLUE-Fin (SC-Fin) benchmark is a pioneering evaluation framework tailored for Chinese-native financial large language models (FLMs). It assesses FLMs across six financial application domains and twenty-five specialized tasks, encompassing theoretical knowledge and practical applications such as compliance, risk management, and investment analysis. Using multi-turn, open-ended conversations that mimic real-life scenarios, SC-Fin measures models on a range of criteria, including accurate financial understanding, logical reasoning, clarity, computational efficiency, business acumen, risk perception, and compliance with Chinese regulations. In a rigorous evaluation involving over a thousand questions, SC-Fin identifies a performance hierarchy where domestic models like GLM-4 and MoonShot-v1-128k outperform others with an A-grade, highlighting the potential for further development in transforming theoretical knowledge into pragmatic financial solutions. This benchmark serves as a critical tool for refining FLMs in the Chinese context, directing improvements in financial knowledge databases, standardizing financial interpretations, and promoting models that prioritize compliance, risk management, and secure practices. We create a contextually relevant and comprehensive benchmark that drives the development of AI in the Chinese financial sector. SC-Fin facilitates the advancement and responsible deployment of FLMs, offering valuable insights for enhancing model performance and usability for both individual and institutional users in the Chinese market..~\footnote{Our benchmark can be found at \url{https://www.CLUEbenchmarks.com}}.
翻译:SuperCLUE-Fin(SC-Fin)基准是一个专为中文原生金融大语言模型(FLMs)量身定制的开创性评估框架。该基准覆盖六个金融应用领域和二十五项专业任务,包括理论知识及合规审查、风险管理、投资分析等实际应用。通过模拟真实场景的多轮开放式对话,SC-Fin从精准金融理解、逻辑推理、清晰度、计算效率、商业洞察、风险感知及中文法规遵循等多个维度对模型进行测评。在涉及上千道问题的严格评估中,SC-Fin揭示了模型性能层级,其中GLM-4和MoonShot-v1-128k等国内模型以A级评级领先其他模型,凸显了将理论知识转化为实用金融解决方案的进一步发展潜力。该基准成为中文语境下优化FLMs的关键工具,可指导金融知识库改进、规范金融释义,并推动优先关注合规性、风险管理与安全实践的模型发展。我们构建了一个具备上下文相关性的综合性基准,助力中文金融领域AI的发展。SC-Fin促进了FLMs的进步与负责任部署,为提升中国市场中个人及机构用户的模型性能与可用性提供了宝贵见解。~\footnote{我们的基准可在\url{https://www.CLUEbenchmarks.com}}访问。