Language models such as GPT and Llama have shown remarkable ability on diverse natural language tasks, yet their performance on complex table tasks (e.g., NL-to-Code and data cleaning) remains suboptimal. Improving performance typically requires task-specific fine-tuning, which depends on expensive human labeling and is prone to overfitting. In this work, we propose Table-LLM-Specialist, a self-trained fine-tuning paradigm designed for table tasks. Our key insight is that many table tasks admit two dual formulations: a generative version and a classification version. Leveraging this duality, we introduce a Generator-Validator paradigm that iteratively generates and validates training data using language models, enabling effective fine-tuning without manually labeled data. Extensive evaluations on Llama, GPT-3.5, and GPT-4 show that Table-LLM-Specialist achieves (1) strong performance across diverse tasks compared to base models, for example, models fine-tuned on GPT-3.5 often surpass GPT-4 level quality; (2) lower deployment cost by enabling smaller models to reach high quality with reduced latency and cost; and (3) better generalization across multiple benchmarks, due to training on diverse, systematically generated data from real-world tables. Our code is available at https://github.com/microsoft/Table-Specialist. Models fine-tuned with Table-LLM-Specialist have been integrated into Microsoft Excel and are deployed in production for automated table data cleaning.
翻译:GPT、Llama等语言模型在多样化的自然语言任务中展现出卓越能力,但在复杂表格任务(如自然语言转代码NL-to-Code和数据清洗)上的表现仍不理想。提升性能通常需要针对特定任务进行微调,而此类微调依赖昂贵的人工标注且易过拟合。本文提出一种面向表格任务的自训练微调范式——Table-LLM-Specialist。核心洞察在于:许多表格任务存在两种对偶形式(生成式版本与分类式版本)。基于该对偶性,我们引入生成器-验证器范式,利用语言模型迭代生成并验证训练数据,从而在无需人工标注数据的情况下实现高效微调。在Llama、GPT-3.5和GPT-4上的广泛评估表明,Table-LLM-Specialist具备以下特性:(1) 相比基准模型,在多样化任务中表现强劲——例如基于GPT-3.5微调的模型常超越GPT-4的质量水平;(2) 降低部署成本——使较小模型通过低延迟低成本实现高质量;(3) 更优跨基准泛化能力——得益于基于真实表格系统生成多样化数据训练。我们的代码开源至https://github.com/microsoft/Table-Specialist。经Table-LLM-Specialist微调的模型已集成至Microsoft Excel,并部署至生产环境实现自动表格数据清洗。