This position paper argues that foundation models for tabular data face inherent limitations when isolated from operational context - the procedural logic, declarative rules, and domain knowledge that define how data is created and governed. Current approaches focus on single-table generalization or schema-level relationships, fundamentally missing the operational knowledge that gives data meaning. We introduce Semantically Linked Tables (SLT) and Foundation Models for SLT (FMSLT) as a new model class that grounds tabular data in its operational context. We propose dual-phase training: pre-training on open-source code-data pairs and synthetic systems to learn business logic mechanics, followed by zero-shot inference on proprietary data. We introduce the ``Operational Turing Test'' benchmark and argue that operational grounding is essential for autonomous agents in complex data environments.
翻译:本立场论文认为,当脱离操作情境——即定义数据创建与治理过程的程序逻辑、声明性规则和领域知识时,表格数据基础模型面临固有的局限性。当前方法主要关注单表泛化或模式级关系,从根本上缺失了赋予数据意义的操作知识。我们引入语义关联表(SLT)及其基础模型(FMSLT)作为一种新的模型类别,将表格数据锚定于其操作情境中。我们提出双阶段训练方案:首先在开源代码-数据对与合成系统上进行预训练以学习业务逻辑机制,随后对专有数据进行零样本推理。我们提出“操作图灵测试”基准,并论证操作情境化基础对于复杂数据环境中的自主智能体至关重要。