Domain Large Language Models (LLMs) are developed for domain-specific tasks based on general LLMs. But it still requires professional knowledge to facilitate the expertise for some domain-specific tasks. In this paper, we investigate into knowledge-intensive calculation problems. We find that the math problems to be challenging for LLMs, when involving complex domain-specific rules and knowledge documents, rather than simple formulations of terminologies. Therefore, we propose a pipeline to solve the domain-specific calculation problems with Knowledge-Intensive Programs Generator more effectively, named as KIPG. It generates knowledge-intensive programs according to the domain-specific documents. For each query, key variables are extracted, then outcomes which are dependent on domain knowledge are calculated with the programs. By iterative preference alignment, the code generator learns to improve the logic consistency with the domain knowledge. Taking legal domain as an example, we have conducted experiments to prove the effectiveness of our pipeline, and extensive analysis on the modules. We also find that the code generator is also adaptable to other domains, without training on the new knowledge.
翻译:领域大语言模型(Domain Large Language Models)是在通用大语言模型基础上为领域特定任务开发的模型。然而,对于某些领域特定任务,仍需专业知识来支撑其专业能力。本文研究了知识密集型计算问题。我们发现,当涉及复杂的领域特定规则和知识文档,而非简单的术语表述时,这类数学问题对大语言模型具有挑战性。因此,我们提出了一种名为KIPG的流程,以更有效地利用知识密集型程序生成器解决领域特定计算问题。该流程根据领域特定文档生成知识密集型程序。对于每个查询,首先提取关键变量,随后利用生成的程序计算依赖于领域知识的结果。通过迭代偏好对齐,代码生成器学习提升其逻辑与领域知识的一致性。以法律领域为例,我们进行了实验以证明该流程的有效性,并对各模块进行了深入分析。我们还发现,该代码生成器无需在新知识上训练即可适应其他领域。