Sharing research data is complex, labor-intensive, expensive, and requires infrastructure investments by multiple stakeholders. Open science policies focus on data release rather than on data reuse, yet reuse is also difficult, expensive, and may never occur. Investments in data management could be made more wisely by considering who might reuse data, how, why, for what purposes, and when. Data creators cannot anticipate all possible reuses or reusers; our goal is to identify factors that may aid stakeholders in deciding how to invest in research data, how to identify potential reuses and reusers, and how to improve data exchange processes. Drawing upon empirical studies of data sharing and reuse, we develop the theoretical construct of distance between data creator and data reuser, identifying six distance dimensions that influence the ability to transfer knowledge effectively: domain, methods, collaboration, curation, purposes, and time and temporality. These dimensions are primarily social in character, with associated technical aspects that can decrease - or increase - distances between creators and reusers. We identify the order of expected influence on data reuse and ways in which the six dimensions are interdependent. Our theoretical framing of the distance between data creators and prospective reusers leads to recommendations to four categories of stakeholders on how to make data sharing and reuse more effective: data creators, data reusers, data archivists, and funding agencies.
翻译:研究数据共享过程复杂、劳动密集且成本高昂,需要多方利益相关者的基础设施投入。开放科学政策聚焦于数据发布而非数据再利用,然而数据再利用同样困难、昂贵,甚至可能根本不会发生。通过考量潜在的数据再利用者、再利用方式、目的、时机及动机,可以更明智地进行数据管理投资。数据创造者无法预见所有可能的再利用场景或再利用者;我们的目标是识别有助于利益相关者决定如何投资研究数据、如何识别潜在再利用场景与再利用者,以及如何改进数据交换流程的关键因素。基于对数据共享与再利用的实证研究,我们构建了数据创造者与数据再利用者之间"距离"的理论框架,提出影响知识有效传递能力的六个距离维度:领域、方法、协作、策管、目的,以及时间与时效性。这些维度以社会属性为主,并关联着可能缩小或扩大创造者与再利用者之间距离的技术要素。我们明确了各维度对数据再利用预期影响程度的排序及其相互依赖关系。通过对数据创造者与潜在再利用者之间距离的理论建构,我们向四类利益相关者——数据创造者、数据再利用者、数据档案管理者及资助机构——提出了提升数据共享与再利用效率的具体建议。