Tabular data are crucial in many fields and their understanding by large language models (LLMs) under high parameter efficiency paradigm is important. However, directly applying parameter-efficient fine-tuning (PEFT) techniques to tabular tasks presents significant challenges, particularly in terms of better table serialization and the representation of two-dimensional structured information within a one-dimensional sequence. To address this, we propose TableLoRA, a module designed to improve LLMs' understanding of table structure during PEFT. It incorporates special tokens for serializing tables with special token encoder and uses 2D LoRA to encode low-rank information on cell positions. Experiments on four tabular-related datasets demonstrate that TableLoRA consistently outperforms vanilla LoRA and surpasses various table encoding methods tested in control experiments. These findings reveal that TableLoRA, as a table-specific LoRA, enhances the ability of LLMs to process tabular data effectively, especially in low-parameter settings, demonstrating its potential as a robust solution for handling table-related tasks.
翻译:表格数据在许多领域至关重要,在大语言模型(LLM)的高参数效率范式下理解此类数据具有重要意义。然而,直接将参数高效微调(PEFT)技术应用于表格任务面临显著挑战,尤其是在实现更优的表格序列化以及将二维结构化信息表示于一维序列方面。为此,我们提出TableLoRA,这是一个旨在提升LLM在PEFT过程中理解表格结构能力的模块。它通过特殊标记编码器引入用于表格序列化的特殊标记,并利用二维LoRA对单元格位置的低秩信息进行编码。在四个表格相关数据集上的实验表明,TableLoRA始终优于原始LoRA,并在对照实验中超越了多种表格编码方法。这些发现表明,TableLoRA作为一种表格专用的LoRA,有效增强了LLM处理表格数据的能力,尤其在低参数设置下,彰显了其作为处理表格相关任务的稳健解决方案的潜力。