CIR+CVN: Bridging LLM Semantic Understanding and Petri-Net Verification for Concurrent Programs

Recovering concurrency structure directly from source code is difficult because shared-resource identity and protection relations are often obscured by aliasing, ownership, and API-specific idioms. We therefore study a specification-driven, model-first verification architecture for LLM-assisted concurrent program construction. Instead of verifying arbitrary source code, a large language model first synthesizes a verification-oriented concurrency artifact from a natural-language requirement or system specification. The first formalism, the Concurrency Intermediate Representation (Cir), is a statement-level, alias-free model in which shared resources are globally named, protection relations are explicit, and each statement carries a stable identifier. The second formalism, the Concurrency Verification Net (Cvn), is a weighted place/transition Petri net with a finite global store and three-valued guards for data-dependent branching. A validated Cir artifact is translated mechanically to Cvn, explored exhaustively, and any counterexample is mapped back to statement identifiers to guide targeted repair. To reduce the risk of bug-free but behavior-dropping repairs, acceptance additionally applies a lightweight goal-reachability check over designated critical outcomes. We formalize both representations, prove translation-correspondence results for deadlock and signal-loss analysis, define a two-layer checking architecture with 61 static rules and 5 analysis predicates, and evaluate the pipeline on 9 representative bounded-concurrency patterns. The results show that the method supports iterative bug detection and repair on Cir artifacts and that goal reachability helps filter semantically incomplete repairs. The trust boundary of the present work is the generated Cir artifact rather than arbitrary source code.

翻译：直接从源代码恢复并发结构非常困难，因为共享资源的标识与保护关系常被别名机制、所有权语义及特定API惯用法所遮蔽。因此，我们研究了一种面向规范的、模型优先的验证架构，用于大语言模型辅助的并发程序构建。该方法并非直接验证任意源代码，而是让大语言模型首先根据自然语言需求或系统规范，合成一种面向验证的并发工件。第一种形式化模型为并发中间表示（Cir），这是一种无别名的语句级模型：共享资源采用全局命名、保护关系显式表达、每条语句携带稳定标识符。第二种形式化模型为并发验证网（Cvn），这是一种带有有限全局存储和三值守卫（用于数据依赖分支）的加权库所/变迁Petri网。经验证的Cir工件可机械转换为Cvn，经穷举探索后，任何反例都将映射回语句标识符以指导定向修复。为降低修复中遗漏正确行为（虽无显式错误）的风险，验收环节还采用轻量级目标可达性检查，验证所指定的关键结果。我们形式化了两种表示，证明了死锁与信号丢失分析中的翻译对应定理，定义了包含61条静态规则与5个分析谓词的两层检查架构，并在9个典型有界并发模式上评估了该流水线。结果表明，该方法支持对Cir工件的迭代式缺陷检测与修复，且目标可达性检查有助于过滤语义不完整的修复。当前工作的信任边界限定为生成的Cir工件，而非任意源代码。