Many tool-based Retrieval Augmented Generation (RAG) systems lack precise mechanisms for tracing final responses back to specific tool components -- a critical gap as systems scale to complex multi-agent architectures. We present \textbf{Atomic Information Flow (AIF)}, a graph-based network flow model that decomposes tool outputs and LLM calls into atoms: indivisible, self-contained units of information. By modeling LLM orchestration as a directed flow of atoms from tool and LLM nodes to a response super-sink, AIF enables granular attribution metrics for AI explainability. Motivated by the max-flow min-cut theorem in network flow theory, we train a lightweight Gemma3 (4B parameter) language model as a context compressor to approximate the minimum cut of tool atoms using flow signals computed offline by AIF. We note that the base Gemma3-4B model struggles to identify critical information with \textbf{54.7\%} accuracy on HotpotQA, barely outperforming lexical baselines (BM25). However, post-training on AIF signals boosts accuracy to \textbf{82.71\%} (+28.01 points) while achieving \textbf{87.52\%} (+1.85\%) context token compression -- bridging the gap with the Gemma3-27B variant, a model nearly $7\times$ larger.
翻译:许多基于工具的检索增强生成(RAG)系统缺乏将最终响应追溯至特定工具组件的精确机制——这一关键缺陷在系统扩展至复杂多智能体架构时尤为突出。我们提出**原子信息流(AIF)**,一种基于图的网络流模型,它将工具输出和LLM调用分解为原子:不可分割、自包含的信息单元。通过将LLM编排建模为原子从工具节点和LLM节点流向响应超级汇的有向流,AIF为AI可解释性提供了细粒度的归因度量。受网络流理论中最大流最小割定理的启发,我们训练了一个轻量级Gemma3(40亿参数)语言模型作为上下文压缩器,利用AIF离线计算的流信号来近似工具原子的最小割。我们注意到,基础Gemma3-4B模型在HotpotQA上识别关键信息的准确率仅为**54.7%**,勉强超过词法基线(BM25)。然而,基于AIF信号的后训练将准确率提升至**82.71%**(+28.01个百分点),同时实现**87.52%**(+1.85%)的上下文令牌压缩——弥合了与Gemma3-270B变体(模型规模近$7\times$)的性能差距。