Scientists increasingly recognize the importance of providing rich, standards-adherent metadata to describe their experimental results. Despite the availability of sophisticated tools to assist in the process of data annotation, investigators generally seem to prefer to use spreadsheets when supplying metadata, despite the limitations of spreadsheets in ensuring metadata consistency and compliance with formal specifications. In this paper, we describe an end-to-end approach that supports spreadsheet-based entry of metadata, while ensuring rigorous adherence to community-based metadata standards and providing quality control. Our methods employ several key components, including customizable templates that represent metadata standards and that can inform the spreadsheets that investigators use to author metadata, controlled terminologies and ontologies for defining metadata values that can be accessed directly from a spreadsheet, and an interactive Web-based tool that allows users to rapidly identify and fix errors in their spreadsheet-based metadata. We demonstrate how this approach is being deployed in a biomedical consortium known as HuBMAP to define and collect metadata about a wide range of biological assays.
翻译:科学家日益认识到提供丰富且符合标准的元数据以描述其实验结果的重要性。尽管现有复杂工具可辅助数据标注过程,但研究人员通常更倾向于使用电子表格来提交元数据,尽管电子表格在确保元数据一致性和符合正式规范方面存在局限性。本文描述了一种端到端方法,支持基于电子表格的元数据录入,同时确保严格遵循基于社区的元数据标准并提供质量控制。我们的方法采用若干关键组件,包括可自定义的模板(用于表示元数据标准并指导研究人员编写元数据所用的电子表格)、受控术语和本体(用于定义可通过电子表格直接访问的元数据值),以及一个基于Web的交互式工具(允许用户快速识别并修正其基于电子表格的元数据中的错误)。我们展示了该方法如何在名为HuBMAP的生物医学联盟中部署,以定义和收集关于多种生物检测的元数据。