LLM-FuncMapper: Function Identification for Interpreting Complex Clauses in Building Codes via LLM

As a vital stage of automated rule checking (ARC), rule interpretation of regulatory texts requires considerable effort. However, interpreting regulatory clauses with implicit properties or complex computational logic is still challenging due to the lack of domain knowledge and limited expressibility of conventional logic representations. Thus, LLM-FuncMapper, an approach to identifying predefined functions needed to interpret various regulatory clauses based on the large language model (LLM), is proposed. First, by systematically analysis of building codes, a series of atomic functions are defined to capture shared computational logics of implicit properties and complex constraints, creating a database of common blocks for interpreting regulatory clauses. Then, a prompt template with the chain of thought is developed and further enhanced with a classification-based tuning strategy, to enable common LLMs for effective function identification. Finally, the proposed approach is validated with statistical analysis, experiments, and proof of concept. Statistical analysis reveals a long-tail distribution and high expressibility of the developed function database, with which almost 100% of computer-processible clauses can be interpreted and represented as computer-executable codes. Experiments show that LLM-FuncMapper achieve promising results in identifying relevant predefined functions for rule interpretation. Further proof of concept in automated rule interpretation also demonstrates the possibility of LLM-FuncMapper in interpreting complex regulatory clauses. To the best of our knowledge, this study is the first attempt to introduce LLM for understanding and interpreting complex regulatory clauses, which may shed light on further adoption of LLM in the construction domain.

翻译：作为自动化规则检查（ARC）的关键环节，监管文本的规则解释需要投入大量精力。然而，由于缺乏领域知识且传统逻辑表达方式的表达能力有限，对具有隐式属性或复杂计算逻辑的监管条款进行解释仍具挑战性。为此，本文提出LLM-FuncMapper方法——一种基于大型语言模型（LLM）识别解释各类监管条款所需预定义功能的方法。首先，通过对建筑规范的系统性分析，定义一系列原子功能以捕获隐式属性和复杂约束中的共享计算逻辑，构建用于解释监管条款的通用模块数据库。其次，开发基于思维链的提示模板，并通过分类调优策略进一步增强，使通用LLM能够有效执行功能识别任务。最后，通过统计分析、实验验证和概念验证对所提方法进行验证。统计分析表明，所开发的功能数据库呈现长尾分布且具有高表达性，可解释近100%的可计算处理条款并将其表示为计算机可执行代码。实验结果显示，LLM-FuncMapper在识别规则解释所需的相关预定义功能方面取得了显著成效。进一步的自动化规则解释概念验证也证明了LLM-FuncMapper在解释复杂监管条款中的可行性。据我们所知，本研究是首次尝试引入LLM理解和解析复杂监管条款，可能为LLM在建筑领域的深度应用提供启示。