Mixing Expert Knowledge: Bring Human Thoughts Back To the Game of Go

Large language models (LLMs) have demonstrated exceptional performance in reasoning tasks such as mathematics and coding, matching or surpassing human capabilities. However, these impressive reasoning abilities face significant challenges in specialized domains. Taking Go as an example, although AlphaGo has established the high performance ceiling of AI systems in Go, mainstream LLMs still struggle to reach even beginner-level proficiency, let alone perform natural language reasoning. This performance gap between general-purpose LLMs and domain experts is significantly limiting the application of LLMs on a wider range of domain-specific tasks. In this work, we aim to bridge the divide between LLMs' general reasoning capabilities and expert knowledge in domain-specific tasks. We perform mixed fine-tuning with structured Go expertise and general long Chain-of-Thought (CoT) reasoning data as a cold start, followed by reinforcement learning to integrate expert knowledge in Go with general reasoning capabilities. Through this methodology, we present \textbf{LoGos}, a powerful LLM that not only maintains outstanding general reasoning abilities, but also conducts Go gameplay in natural language, demonstrating effective strategic reasoning and accurate next-move prediction. LoGos achieves performance comparable to human professional players, substantially surpassing all existing LLMs. Through this work, we aim to contribute insights on applying general LLM reasoning capabilities to specialized domains. We will release the first large-scale Go dataset for LLM training, the first LLM Go evaluation benchmark, and the first general LLM that reaches human professional-level performance in Go at: https://github.com/Entarochuan/LoGos.

翻译：大型语言模型（LLM）在数学与编程等推理任务中展现出卓越性能，达到甚至超越了人类水平。然而，这些令人瞩目的推理能力在专业领域中面临显著挑战。以围棋为例，尽管AlphaGo已确立AI系统在围棋中的性能上限，主流LLM仍难以达到入门级水平，更无法进行自然语言推理。通用LLM与领域专家之间的这种性能差距，严重限制了LLM在更广泛领域特定任务中的应用。本研究旨在弥合LLM通用推理能力与领域特定任务中专家知识之间的鸿沟。我们采用结构化围棋专业知识与通用长链思维（CoT）推理数据进行混合微调作为冷启动，随后通过强化学习将围棋专家知识与通用推理能力相融合。基于该方法，我们提出\textbf{LoGos}——一个不仅保持卓越通用推理能力，还能以自然语言进行围棋对弈、展现有效策略推理与精准落子预测的强大LLM。LoGos实现了与人类职业棋手相当的性能，显著超越所有现有LLM。通过本工作，我们旨在为通用LLM推理能力在专业领域的应用提供见解。我们将发布首个用于LLM训练的大规模围棋数据集、首个LLM围棋评估基准，以及首个在围棋中达到人类职业水平的通用LLM，项目地址：https://github.com/Entarochuan/LoGos。