This study investigated LLM-based automation for analyzing non-financial data in corporate credit evaluation. Two systems were developed and compared: a Single-Agent System (SAS), in which one LLM agent infers favorable and adverse repayment signals, and a Popperian Multi-agent Debate System (PMADS), which structures the dual-perspective analysis as adversarial argumentation under the Karl Popper Debate protocol. Evaluation addressed three fronts: (i) work productivity compared with human experts; (ii) perceived report quality and usability, rated by credit risk professionals for system-generated reports; and (iii) reasoning characteristics quantified via reasoning-tree analysis. Both systems drastically reduced task completion time relative to human experts. Professionals rated SAS reports as adequate, while PMADS reports exceeded neutral benchmarks and scored significantly higher in explanatory adequacy, practical applicability, and usability. Reasoning-tree analysis showed PMADS produced deeper, more elaborated structures, whereas SAS yielded single-layered trees. These findings suggest that structured multi-agent debate enhances analytical rigor and perceived usefulness, though at the cost of longer computation time. Overall, the results demonstrate that reasoning-centered automation represents a promising approach for developing useful AI systems in decision-critical financial contexts.
翻译:本研究探讨了基于大型语言模型(LLM)的自动化方法在企业信用评估中分析非财务数据的应用。开发并比较了两种系统:单智能体系统(SAS),由一个LLM智能体推断有利和不利的还款信号;以及波普尔式多智能体辩论系统(PMADS),该系统在卡尔·波普尔辩论协议下,将双视角分析构建为对抗性论证。评估从三个方面展开:(i)与人类专家相比的工作效率;(ii)由信用风险专业人士对系统生成报告进行评定的感知报告质量与可用性;(iii)通过推理树分析量化的推理特征。相较于人类专家,两种系统均大幅缩短了任务完成时间。专业人士认为SAS报告质量尚可,而PMADS报告则超越了中性基准,并在解释充分性、实际适用性和可用性方面得分显著更高。推理树分析表明,PMADS产生了更深层、更精细的结构,而SAS则生成单层树。这些发现表明,结构化的多智能体辩论增强了分析的严谨性和感知有用性,尽管以更长的计算时间为代价。总体而言,研究结果证明,以推理为中心的自动化是开发适用于决策关键型金融场景的有用人工智能系统的一条有前景的途径。