Tabular foundation models aim to learn universal representations of tabular data that transfer across tasks and domains, enabling applications such as table retrieval, semantic search and table-based prediction. Despite the growing number of such models, it remains unclear which approach works best in practice, as existing methods are often evaluated under task-specific settings that make direct comparison difficult. To address this, we introduce TEmBed, the Tabular Embedding Test Bed, a comprehensive benchmark for systematically evaluating tabular embeddings across four representation levels: cell, row, column, and table. Evaluating a diverse set of tabular representation learning models, we show that which model to use depends on the task and representation level. Our results offer practical guidance for selecting tabular embeddings in real-world applications and lay the groundwork for developing more general-purpose tabular representation models.
翻译:表格基础模型旨在学习可跨任务和领域迁移的通用表格数据表示,从而支持表格检索、语义搜索及基于表格的预测等应用。尽管此类模型日益增多,但现有方法常在不同任务特定设置下进行评估,难以直接比较,导致在实践中何种方法最优仍不明确。为解决这一问题,我们提出了TEmBed(表格嵌入测试平台),这是一个系统性评估表格嵌入的综合性基准,涵盖四个表示层级:单元格、行、列和表格。通过对多种表格表示学习模型的评估,我们发现最佳模型的选择取决于具体任务和表示层级。我们的研究结果为实际应用中选择表格嵌入提供了实用指南,并为开发更通用的表格表示模型奠定了基础。