This review paper explores the evolution of discussions about "long-tail" scientific data in the scholarly literature. The "long-tail" concept, originally used to explain trends in digital consumer goods, was first applied to scientific data in 2007 to refer to a vast array of smaller, heterogeneous data collections that cumulatively represent a substantial portion of scientific knowledge. However, these datasets, often referred to as "long-tail data," are frequently mismanaged or overlooked due to inadequate data management practices and institutional support. This paper examines the changing landscape of discussions about long-tail data over time, situated within broader ecosystems of research data management and the natural interplay between "big" and "small" data. The review also bridges discussions on data curation in Library & Information Science (LIS) and domain-specific contexts, contributing to a more comprehensive understanding of the long-tail concept's utility for effective data management outcomes. The review aims to provide a more comprehensive understanding of this concept, its terminological diversity in the literature, and its utility for guiding data management, overall informing current and future information science research and practice.
翻译:本文综述探讨了学术文献中关于"长尾"科学数据讨论的演变历程。"长尾"概念最初用于解释数字消费品的分布趋势,于2007年首次被应用于科学数据领域,指代大量规模较小、异质性的数据集合,这些数据累积起来构成了科学知识的重要组成部分。然而,这些常被称为"长尾数据"的数据集,由于数据管理实践和机构支持的不足,经常被不当管理或忽视。本文通过考察研究数据管理的宏观生态系统以及"大"数据与"小"数据之间的自然互动关系,分析了长期以来关于长尾数据讨论的变迁。本综述还桥接了图书馆情报学领域与特定学科背景下关于数据策展的讨论,有助于更全面地理解长尾概念对实现有效数据管理成果的效用。本文旨在提供对该概念更全面的理解,包括其在文献中的术语多样性及其对指导数据管理的实用价值,从而为当前及未来的信息科学研究与实践提供参考。