Table processing-including cleaning, transformation, augmentation, and matching-is a foundational yet error-prone stage in real-world data pipelines. While recent LLM-based approaches show promise for automating such tasks, they often struggle in practice due to ambiguous instructions, complex task structures, and the lack of structured feedback, resulting in syntactically correct but semantically flawed code. To address these challenges, we propose ProfiliTable, an autonomous multi-agent framework centered on dynamic profiling, which constructs and iteratively refines a unified execution context through interactive exploration, knowledge-augmented synthesis, and feedback-driven refinement. ProfiliTable integrates (i) a Profiler that performs ReAct-style data exploration to build semantic understanding, (ii) a Generator that retrieves curated operators to synthesize task-aware code, and (iii) an Evaluator-Summarizer loop that injects execution scores and diagnostic insights to enable closed-loop refinement. Extensive experiments on a diverse benchmark covering 18 tabular task types demonstrate that ProfiliTable consistently outperforms strong baselines, particularly in complex multi-step scenarios. These results highlight the critical role of dynamic profiling in reliably translating ambiguous user intents into robust and governance-compliant table transformations.
翻译:表格处理——包括清洗、转换、增强与匹配——是现实数据流水线中基础但易出错的阶段。尽管近期基于大语言模型的方法在自动化此类任务中展现了潜力,但由于指令模糊、任务结构复杂以及缺乏结构化反馈,这些方法在实践中常面临挑战,导致生成语法正确但语义有缺陷的代码。为应对这些问题,我们提出ProfiliTable——一个以动态剖析为核心、由多智能体协同的自主框架。该框架通过交互式探索、知识增强合成与反馈驱动优化,构建并迭代精炼统一的执行上下文。ProfiliTable集成了:(i) 剖析器——采用ReAct风格的数据探索以建立语义理解;(ii) 生成器——检索经过编排的操作符以合成任务感知代码;(iii) 评估-总结循环——注入执行评分与诊断洞察以实现闭环优化。在覆盖18种表格任务类型的多样化基准上开展的大量实验表明,ProfiliTable始终优于强基线方法,尤其在复杂多步骤场景中表现更为突出。这些结果凸显了动态剖析在将模糊用户意图可靠转化为鲁棒且符合治理规范的表格转换过程中所发挥的关键作用。