Table understanding (TU) has achieved promising advancements, but it faces the challenges of the scarcity of manually labeled tables and the presence of complex table structures.To address these challenges, we propose HGT, a framework with a heterogeneous graph (HG)-enhanced large language model (LLM) to tackle few-shot TU tasks.It leverages the LLM by aligning the table semantics with the LLM's parametric knowledge through soft prompts and instruction turning and deals with complex tables by a multi-task pre-training scheme involving three novel multi-granularity self-supervised HG pre-training objectives.We empirically demonstrate the effectiveness of HGT, showing that it outperforms the SOTA for few-shot complex TU on several benchmarks.
翻译:摘要:表格理解(TU)已取得了显著进展,但仍面临人工标注表格稀缺及复杂表格结构的挑战。为解决这些问题,我们提出HGT——一种基于异构图(HG)增强的大语言模型(LLM)框架,用于处理少样本TU任务。该框架通过软提示和指令微调将表格语义与LLM的参数化知识对齐,并利用多任务预训练方案处理复杂表格,该方案涉及三种新颖的多粒度自监督异构图预训练目标。实验证明了HGT的有效性,表明其在多个基准测试中的少样本复杂表格理解任务上优于现有最先进方法。