Audit risk assessment increasingly benefits from combining heterogeneous evidence sources, yet existing approaches typically produce point predictions without quantifying how well different evidence streams agree. We propose UMAR (Uncertainty-Aware Multi-Agent Risk Assessment), a framework that employs three specialized agents: an MD&A Text Agent, a Financial Ratio Agent, and a CAM Agent, each producing independent risk scores with calibrated uncertainty estimates. An Uncertainty Aggregator based on Dempster-Shafer evidence theory fuses these scores while explicitly measuring inter-agent conflict. We evaluate UMAR on a U.S. dataset of 3,200 firm-year observations from SEC 10-K filings (2019-2023), with financial restatement as the target label. Experimental results show that UMAR achieves an AUROC of 0.782 and a PR-AUC of 0.341, outperforming logistic regression, XGBoost, FinBERT, and single-agent and dual-agent LLM baselines. UMAR attains the lowest expected calibration error (ECE = 0.052) among all methods and identifies evidence-conflict patterns that correlate with actual restatement risk, offering auditors potentially actionable and interpretable risk signals.
翻译:审计风险评估越来越受益于结合异构证据源,但现有方法通常产生点预测,而未能量化不同证据流之间的吻合程度。我们提出UMAR(不确定性感知的多智能体风险评估),一个采用三个专门智能体的框架:MD&A文本智能体、财务比率智能体和CAM智能体,每个智能体产出带有校准不确定性估计的独立风险评分。基于Dempster-Shafer证据理论的不确定性聚合器在融合这些评分的同时,显式度量智能体间的冲突。我们在一个包含3,200个公司-年度观测值的美国数据集上评估UMAR,该数据集源自SEC 10-K备案文件(2019-2023年),并以财务重述作为目标标签。实验结果表明,UMAR实现了0.782的AUROC和0.341的PR-AUC,优于逻辑回归、XGBoost、FinBERT以及单智能体和双智能体LLM基线。UMAR在所有方法中达到了最低的期望校准误差(ECE=0.052),并识别出与实际重述风险相关的证据冲突模式,为审计师提供了潜在可操作且可解释的风险信号。