Table modeling has progressed for decades. In this work, we revisit this trajectory and highlight emerging challenges in the LLM era, particularly the paradox of choice: the difficulty of attributing performance gains amid diverse base models and training sets in the context of table instruction tuning. We replicate four table LLMs by instruction-tuning three foundation models on four existing datasets, yielding 12 models. We then evaluate these models across 16 table benchmarks. Our study is the first to quantitatively disentangle the effects of training data and base model selection, revealing that base model choice plays a more dominant role than the training data itself. Generalization and reasoning remain challenging, inviting future effort on table modeling. Based on our findings, we share our thoughts on the future directions for table modeling.
翻译:表格建模研究已发展数十年。本文重新审视这一历程,并重点论述大语言模型时代涌现的新挑战,特别是选择悖论:在表格指令微调背景下,面对多样化基础模型与训练集时难以归因性能提升。我们通过对四个现有数据集上的三个基础模型进行指令微调,复现了四个表格大语言模型,共计产生12个模型,并在16个表格基准上对其进行评估。本研究首次定量分离训练数据与基础模型选择的影响效应,揭示基础模型的选择对性能的主导作用远超训练数据本身。泛化能力与推理能力仍是挑战,亟待未来在表格建模领域的探索。基于研究发现,我们提出对表格建模未来方向的思考。