The rapid advancement of artificial intelligence in materials science requires data standards and data management practices that can capture the complexity of real-world structures, including surfaces, interfaces, defects, and dimensionality reduction. We present M-CODE - Materials Categorization via Ontology, Dimensionality and Evolution - a compact categorization system that links materials-science-specific terminology to a set of reusable concepts as building blocks and provenance-aware transformations. M-CODE classifies structures by dimensionality, structural complexity (from pristine to compound pristine, defective, and processed), and variants that capture common structure creation and evolution approaches. A practical implementation of the categorization is provided in an open-source codebase that includes JSON schemas, examples, and Python and TypeScript types/interfaces, designed to support reproducible dataset generation, validation, and community contributions.
翻译:人工智能在材料科学中的快速发展,要求数据标准与数据管理实践能够捕捉现实世界结构的复杂性,包括表面、界面、缺陷及降维处理。本文提出M-CODE——一种基于本体论、维度与演化的材料分类系统,它是一个紧凑的分类体系,将材料科学专用术语与一组可复用的概念作为构建块及溯源感知的转换操作联系起来。M-CODE通过维度、结构复杂性(从原始结构到复合原始结构、缺陷结构及处理后的结构)以及捕捉常见结构创建与演化方法的变体来对结构进行分类。该分类方法在一个开源代码库中提供了实用实现,包含JSON模式、示例以及Python和TypeScript类型/接口,旨在支持可复现的数据集生成、验证及社区贡献。