Towards a Rosetta Stone for (meta)data: Learning from natural language to improve semantic and cognitive interoperability

In order to effectively manage the overwhelming influx of data, it is crucial to ensure that data is findable, accessible, interoperable, and reusable (FAIR). While ontologies and knowledge graphs have been employed to enhance FAIRness, challenges remain regarding semantic and cognitive interoperability. We explore how English facilitates reliable communication of terms and statements, and transfer our findings to a framework of ontologies and knowledge graphs, while treating terms and statements as minimal information units. We categorize statement types based on their predicates, recognizing the limitations of modeling non-binary predicates with multiple triples, which negatively impacts interoperability. Terms are associated with different frames of reference, and different operations require different schemata. Term mappings and schema crosswalks are therefore vital for semantic interoperability. We propose a machine-actionable Rosetta Stone Framework for (meta)data, which uses reference terms and schemata as an interlingua to minimize mappings and crosswalks. Modeling statements rather than a human-independent reality ensures cognitive familiarity and thus better interoperability of data structures. We extend this Rosetta modeling paradigm to reference schemata, resulting in simple schemata with a consistent structure across statement types, empowering domain experts to create their own schemata using the Rosetta Editor, without requiring knowledge of semantics. The Editor also allows specifying textual and graphical display templates for each schema, delivering human-readable data representations alongside machine-actionable data structures. The Rosetta Query Builder derives queries based on completed input forms and the information from corresponding reference schemata. This work sets the conceptual ground for the Rosetta Stone Framework that we plan to develop in the future.

翻译：为有效管理海量数据涌入，确保数据可发现、可访问、可互操作及可复用（FAIR）至关重要。尽管本体与知识图谱已被用于增强FAIR性，但语义与认知互操作性仍面临挑战。我们探究英语如何促进术语与陈述的可靠交流，并将研究发现迁移至本体与知识图谱框架中，同时将术语与陈述视为最小信息单元。基于谓词对陈述类型进行分类，并认识到用多个三元组建模非二元谓词的局限性，这会对互操作性产生负面影响。术语关联不同的参照框架，不同操作需要不同图式。因此术语映射与图式交叉遍历对语义互操作性至关重要。我们提出一种机器可操作的(元)数据“罗塞塔石碑”框架，采用参考术语与参考图式作为中间语，以最小化映射与交叉遍历。对陈述进行建模而非对独立于人类的现实建模，可确保认知熟悉度，从而提升数据结构的互操作性。我们将此罗塞塔建模范式扩展至参考图式，形成跨陈述类型结构一致的简单图式，使领域专家无需语义知识即可通过罗塞塔编辑器创建自定义图式。该编辑器还允许为每个图式指定文本与图形显示模板，提供人类可读的数据表示与机器可操作的数据结构。罗塞塔查询构建器基于已填写的输入表单及对应参考图式的信息派生查询。本研究为未来拟开发的“罗塞塔石碑”框架奠定了概念基础。

相关内容

Cognition

关注 4

Cognition：Cognition：International Journal of Cognitive Science Explanation：认知：国际认知科学杂志。 Publisher：Elsevier。 SIT： http://www.journals.elsevier.com/cognition/

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

分布外泛化(Out-Of-Distribution Generalization) 综述论文，22页pdf240篇文献

专知会员服务

64+阅读 · 2021年9月2日

【ACL2020】多模态信息抽取，365页ppt

专知会员服务

151+阅读 · 2020年7月6日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日