Multi-agent AI systems are increasingly used to automate software engineering tasks including requirements analysis, architecture design, test generation, and traceability linking. When these agents operate as a sequential pipeline over shared software artifacts, errors and low-confidence decisions made by upstream agents propagate to downstream stages, producing orphaned requirements, contradictory links, and compliance gaps that pose significant risks in safety-critical domains. We propose a trust-aware coordination framework where a shared knowledge graph serves as both centralized semantic memory and a coordination surface through which agents assess and build upon each other's contributions using calibrated confidence scores. Our approach introduces a two-stage traceability link prediction pipeline combining embedding-based retrieval with LLM-based multi-criteria analysis, a traceability seeding mechanism that enables comparison between derivation-time and validation-time confidence, and a consistency protocol governing pipeline interactions through confidence threshold gating, confidence divergence detection, and conflict resolution. We evaluate on an automotive software engineering case study measuring link prediction calibration, protocol effectiveness, threshold sensitivity, and the impact of traceability seeding. Ablation studies confirm that confidence calibration is essential for effective pipeline coordination.
翻译:多智能体AI系统正越来越多地用于自动化软件工程任务,包括需求分析、架构设计、测试生成和可追溯性链接。当这些智能体作为顺序流水线在共享软件工件上运行时,上游智能体产生的错误和低置信度决策会传播至下游阶段,导致孤立的遗漏需求、矛盾链接和合规性缺口,这些问题在安全关键领域构成重大风险。我们提出了一种信任感知的协调框架,其中共享知识图谱既作为集中式语义记忆,又作为协调界面,使智能体能够通过使用校准后的置信度分数来评估和建立彼此的贡献。我们的方法引入了一种两阶段可追溯性链接预测流水线,结合了基于嵌入的检索与基于大语言模型的多标准分析、一种可追溯性播种机制(支持推导时与验证时置信度的比较),以及通过置信度阈值门控、置信度发散检测和冲突解决来调控流水线交互的一致性协议。我们通过一项汽车软件工程案例研究进行评估,测量了链接预测校准、协议有效性、阈值敏感性以及可追溯性播种的影响。消融研究证实,置信度校准对于有效的流水线协调至关重要。