Recent foundation models for tabular data achieve strong task-specific performance via in-context learning. Nevertheless, they focus on direct prediction by encapsulating both representation learning and task-specific inference inside a single, resource-intensive network. This work specifically focuses on representation learning, i.e., on transferable, task-agnostic embeddings. We systematically evaluate task-agnostic representations extracted from tabular foundation models (TabPFN, TabICL and TabSTAR) alongside classical feature engineering (TableVectorizer and a sphere model) across a variety of application tasks as outlier detection (ADBench) and supervised learning (TabArena Lite). We find that simple feature engineering methods achieve comparable or superior performance while requiring significantly less computational resources than tabular foundation models.
翻译:近期出现的表格数据基础模型通过上下文学习实现了强大的任务特定性能。然而,这些模型将表征学习和任务特定推理封装在单一且资源密集的网络中,主要关注直接预测。本研究特别聚焦于表征学习,即可迁移的任务无关嵌入。我们系统评估了从表格基础模型(TabPFN、TabICL 和 TabSTAR)提取的任务无关表征,并与经典特征工程方法(TableVectorizer 和球面模型)在多种应用任务(如异常检测 ADBench 和监督学习 TabArena Lite)上进行比较。研究发现,简单的特征工程方法在显著降低计算资源需求的同时,能够达到与表格基础模型相当或更优的性能。