Table understanding (TU) has achieved promising advancements, but it faces the challenges of the scarcity of manually labeled tables and the presence of complex table structures.To address these challenges, we propose HGT, a framework with a heterogeneous graph (HG)-enhanced large language model (LLM) to tackle few-shot TU tasks.It leverages the LLM by aligning the table semantics with the LLM's parametric knowledge through soft prompts and instruction turning and deals with complex tables by a multi-task pre-training scheme involving three novel multi-granularity self-supervised HG pre-training objectives.We empirically demonstrate the effectiveness of HGT, showing that it outperforms the SOTA for few-shot complex TU on several benchmarks.
翻译:表格理解研究已取得显著进展,但仍面临人工标注表格稀缺和复杂表格结构存在的挑战。为应对这些挑战,我们提出HGT框架,该框架通过异质图增强的大语言模型来处理少样本表格理解任务。该框架通过软提示与指令调优将表格语义与大语言模型的参数化知识对齐,从而有效利用大语言模型;同时,通过包含三项新颖的多粒度自监督异质图预训练目标的多任务预训练方案来处理复杂表格。实证研究表明HGT具有显著有效性,在多个基准测试中其少样本复杂表格理解性能优于当前最优方法。