This study investigates the simultaneous use of multiple metadata schemas at research data repositories. The analysis covers how eight disciplinary research data repositories from the geosciences and social sciences use disciplinary metadata schemas and the DataCite Metadata Schema, and how two metadata records describing the same dataset compare. The results show that DataCite metadata records could be improved considerably by optimizing schema crosswalks. However, the parallel use of disciplinary and multidisciplinary metadata records is complex. For example, discipline has a significant effect on the completeness of DataCite metadata. A temporal analysis also highlights that metadata workflows are diverse, and in some cases, suboptimal crosswalks are likely not the sole cause of incomplete DataCite metadata. Comparing the disciplinary metadata schemas and the DataCite Metadata Schema on a structural level reveals that most differences between schemas are the result of different approaches to modelling statements about datasets, not the lack of opportunity to express them. The element sets of both disciplinary metadata schemas and the DataCite Metadata Schema could be extended to describe datasets in more detail. These observations demonstrate that disciplinary and multidisciplinary metadata schemas serve distinct purposes. Disciplinary repositories should take full advantage of the opportunities both options provide.
翻译:本研究探讨了研究数据仓库中多种元数据模式的并行使用情况。分析涵盖地球科学和社会科学领域的八个学科研究数据仓库如何同时使用学科元数据模式和DataCite元数据模式,以及描述同一数据集的两条元数据记录之间的对比。结果表明,通过优化模式交叉映射,可显著改进DataCite元数据记录的质量。然而,学科性与多学科性元数据记录的并行使用存在复杂性,例如学科属性对DataCite元数据的完整度具有显著影响。时间序列分析进一步揭示,元数据工作流程具有多样性,在某些情况下,次优的交叉映射可能并非导致DataCite元数据不完整的唯一原因。通过结构层面对比学科元数据模式与DataCite元数据模式,发现两者差异的主要根源在于对数据集描述语句的建模方式不同,而非表达能力的缺失。学科元数据模式与DataCite元数据模式的元素集均可扩展以实现更详细的数据集描述。这些观察表明,学科性与多学科性元数据模式服务于不同目标,学科数据仓库应充分利用两类模式提供的机遇。