Large Language Models (LLMs) have shown great promise in automating data analytics tasks by interpreting natural language queries and generating multi-operation execution plans. However, existing LLM-agent-based analytics frameworks operate under the assumption of centralized data access, offering little to no privacy protection. In contrast, federated analytics (FA) enables privacy-preserving computation across distributed data sources, but lacks support for natural language input and requires structured, machine-readable queries. In this work, we present LAFA, the first system that integrates LLM-agent-based data analytics with FA. LAFA introduces a hierarchical multi-agent architecture that accepts natural language queries and transforms them into optimized, executable FA workflows. A coarse-grained planner first decomposes complex queries into sub-queries, while a fine-grained planner maps each subquery into a Directed Acyclic Graph of FA operations using prior structural knowledge. To improve execution efficiency, an optimizer agent rewrites and merges multiple DAGs, eliminating redundant operations and minimizing computational and communicational overhead. Our experiments demonstrate that LAFA consistently outperforms baseline prompting strategies by achieving higher execution plan success rates and reducing resource-intensive FA operations by a substantial margin. This work establishes a practical foundation for privacy-preserving, LLM-driven analytics that supports natural language input in the FA setting.
翻译:大语言模型(LLMs)在自动化数据分析任务中展现出巨大潜力,能够解析自然语言查询并生成多步骤执行计划。然而,现有基于LLM智能体的分析框架均基于数据集中访问的假设运行,几乎不提供隐私保护。相比之下,联邦分析(FA)支持跨分布式数据源的隐私保护计算,但缺乏对自然语言输入的支持,且需要结构化的机器可读查询。本研究提出LAFA,首个将基于LLM智能体的数据分析与FA相融合的系统。LAFA采用分层多智能体架构,可接受自然语言查询并将其转化为优化的可执行FA工作流。粗粒度规划器首先将复杂查询分解为子查询,细粒度规划器则利用先验结构知识将每个子查询映射为FA操作的有向无环图。为提升执行效率,优化器智能体对多个DAG进行重写与合并,消除冗余操作并显著降低计算与通信开销。实验表明,LAFA在保持更高执行计划成功率的同时,大幅减少了资源密集型FA操作,持续优于基线提示策略。本研究为支持自然语言输入的联邦分析场景下实现隐私保护的LLM驱动分析奠定了实践基础。