Semantic data and knowledge infrastructures must reconcile two fundamentally different forms of representation: natural language, in which most knowledge is created and communicated, and formal semantic models, which enable machine-actionable integration, interoperability, and reasoning. Bridging this gap remains a central challenge, particularly when full semantic formalization is required at the point of data entry. Here, we introduce the Semantic Ladder, an architectural framework that enables the progressive formalization of data and knowledge. Building on the concept of modular semantic units as identifiable carriers of meaning, the framework organizes representations across levels of increasing semantic explicitness, ranging from natural language text snippets to ontology-based and higher-order logical models. Transformations between levels support semantic enrichment, statement structuring, and logical modelling while preserving semantic continuity and traceability. This approach enables the incremental construction of semantic knowledge spaces, reduces the semantic parsing burden, and supports the integration of heterogeneous representations, including natural language, structured semantic models, and vector-based embeddings. The Semantic Ladder thereby provides a foundation for scalable, interoperable, and AI-ready data and knowledge infrastructures.
翻译:语义数据与知识基础设施必须调和两种根本不同的表示形式:大多数知识得以创建和传播的自然语言,以及支持机器可操作的集成、互操作性与推理的形式化语义模型。弥合这一鸿沟仍是核心挑战,尤其在数据录入环节即要求完全语义形式化的情况下。本文提出"语义阶梯"架构框架,支持数据和知识的渐进式形式化。该框架基于模块化语义单元作为可识别意义载体的概念,在语义明确性递增的层级间组织表示,涵盖从自然语言文本片段到基于本体及高阶逻辑模型的形态。层级间转换支持语义增强、陈述结构化与逻辑建模,同时保持语义连续性与可追溯性。该方法可实现语义知识空间的增量构建,降低语义解析负担,并支持包含自然语言、结构化语义模型与向量嵌入的异构表示集成。语义阶梯由此为可扩展、可互操作且面向人工智能的数据与知识基础设施奠定基础。