This paper introduces "Shai" a 10B level large language model specifically designed for the asset management industry, built upon an open-source foundational model. With continuous pre-training and fine-tuning using a targeted corpus, Shai demonstrates enhanced performance in tasks relevant to its domain, outperforming baseline models. Our research includes the development of an innovative evaluation framework, which integrates professional qualification exams, tailored tasks, open-ended question answering, and safety assessments, to comprehensively assess Shai's capabilities. Furthermore, we discuss the challenges and implications of utilizing large language models like GPT-4 for performance assessment in asset management, suggesting a combination of automated evaluation and human judgment. Shai's development, showcasing the potential and versatility of 10B-level large language models in the financial sector with significant performance and modest computational requirements, hopes to provide practical insights and methodologies to assist industry peers in their similar endeavors.
翻译:本文介绍"Shai",一个专为资产管理行业设计的100亿参数级别大语言模型,该模型基于开源基础模型构建。通过使用针对性语料库进行持续预训练和微调,Shai在相关领域任务中展现出超越基线模型的增强性能。本研究开发了一套创新评估框架,整合专业资格认证考试、定制化任务、开放式问答及安全性评估,以全面评测Shai的能力。此外,本文探讨了使用GPT-4等大语言模型进行资产管理绩效评估所面临的挑战与启示,提出应结合自动化评估与人工判断。Shai的开发验证了百亿参数级大语言模型在金融领域的潜力与适配性——兼顾卓越性能与适度计算需求,期望为行业从业者的类似探索提供实践洞见与方法论支持。