MADP: A Multi-Agent Pipeline for Sustainable Document Processing with Human-in-the-Loop

Document processing automation remains a critical challenge in enterprise environments, where traditional manual approaches are labor-intensive and error-prone. We present MADP, a multi-agent architecture that addresses the challenge of automating document processing in enterprise settings by combining deep learning-based classification and parsing with large language model extraction, while maintaining accuracy through selective human validation. Our system integrates five specialized agents--Classificator, Splitter, Parser, Extraction, and Validator--with a Human-in-the-Loop (HITL) mechanism and a novel Prompt Fine Tuning with Feedback Inheritance (PFTFI) approach. The operational analysis on a production use-case scenario of 100,000 invoices per year indicates a potential reduction of Full-Time Equivalent (FTE) requirements by approximately 70%. Production deployment on 955 real-world documents processed through January 2026 achieves a 97.0% full-pipeline automation rate, with only 3% requiring non-AI fallback. Ablation evaluation on a stratified 100-document subset (5 documents per each of 20 supplier/document-type categories) demonstrates that the full MADP configuration with Human-in-the-Loop supervision attains 98.5% document-level accuracy. Additionally, we present a comprehensive sustainability analysis showing that our hybrid AI+HITL approach reduces CO2 emissions by 69%, energy consumption by 69%, and water usage by 63% compared to traditional manual processing. Benchmark comparisons of multiple LLM backends (Granite-Docling, Mistral-Small, DeepSeek-OCR) provide practical insights for deployment in production environments.

翻译：文档处理自动化仍然是企业环境中的关键挑战，传统人工方法劳动密集且易出错。我们提出MADP，一种多智能体架构，通过将基于深度学习的分类与解析技术同大语言模型提取相结合，并借助选择性人工验证来保持准确性，从而解决企业场景下文档处理自动化的难题。系统整合了五个专门化智能体——分类器、分割器、解析器、提取器和验证器——并引入人在回路（HITL）机制和一种新颖的带反馈继承的提示微调（PFTFI）方法。基于年产10万张发票的生产级用例运营分析表明，全时等效人力（FTE）需求可降低约70%。在截至2026年1月处理的955份真实世界文档的生产部署中，实现了97.0%的完整流水线自动化率，仅3%需要非人工智能回退。对按分层抽样的100份文档子集（覆盖20个供应商/文档类型类别，每类5份文档）进行的消融评估显示，采用人在回路监督的完整MADP配置达到了98.5%的文档级准确率。此外，我们提供了一项全面的可持续性分析，表明与传统人工处理相比，我们的混合AI+HITL方法可减少69%的二氧化碳排放、69%的能源消耗和63%的用水量。针对多个LLM后端（Granite-Docling、Mistral-Small、DeepSeek-OCR）的基准比较为生产环境中的部署提供了实用见解。