Data sharing is fundamental to scientific progress, enhancing transparency, reproducibility, and innovation across disciplines. Despite its growing significance, the variability of data-sharing practices across research fields remains insufficiently understood, limiting the development of effective policies and infrastructure. This study investigates the evolving landscape of data-sharing practices, specifically focusing on the intentions behind data release, reuse, and referencing. Leveraging the PubMed open dataset, we developed a model to identify mentions of datasets in the full-text of publications. Our analysis reveals that data release is the most prevalent sharing mode, particularly in fields such as Commerce, Management, and the Creative Arts. In contrast, STEM fields, especially the Biological and Agricultural Sciences, show significantly higher rates of data reuse. However, the humanities and social sciences are slower to adopt these practices. Notably, dataset referencing remains low across most disciplines, suggesting that datasets are not yet fully recognized as research outputs. A temporal analysis highlights an acceleration in data releases after 2012, yet obstacles such as data discoverability and compatibility for reuse persist. Our findings can inform institutional and policy-level efforts to improve data-sharing practices, enhance dataset accessibility, and promote broader adoption of open science principles across research domains.
翻译:数据共享是科学进步的基础,能够提升各学科的透明度、可重复性和创新性。尽管其重要性日益增长,但不同研究领域数据共享实践的差异性仍未得到充分理解,这限制了对有效政策与基础设施的开发。本研究调查了数据共享实践的演变格局,特别聚焦于数据发布、重用及引用的意图。利用PubMed开放数据集,我们开发了一个模型来识别出版物全文中的数据集提及。我们的分析表明,数据发布是最普遍的共享模式,尤其在商业、管理和创意艺术等领域。相比之下,STEM领域,特别是生物与农业科学,显示出显著更高的数据重用率。然而,人文与社会科学采用这些实践的速度较慢。值得注意的是,数据集引用在大多数学科中仍然偏低,这表明数据集尚未被完全认可为研究成果。时间分析突显了2012年后数据发布的加速,但数据可发现性和重用兼容性等障碍依然存在。我们的发现可为机构及政策层面的努力提供参考,以改进数据共享实践、增强数据集可访问性,并促进开放科学原则在研究领域的更广泛采纳。