Data science is not a science. It is a research paradigm. Its power, scope, and scale will surpass science, our most powerful research paradigm, to enable knowledge discovery and change our world. We have yet to understand and define it, vital to realizing its potential and managing its risks. Modern data science is in its infancy. Emerging slowly since 1962 and rapidly since 2000, it is a fundamentally new field of inquiry, one of the most active, powerful, and rapidly evolving 21st century innovations. Due to its value, power, and applicability, it is emerging in 40+ disciplines, hundreds of research areas, and thousands of applications. Millions of data science publications contain myriad definitions of data science and data science problem solving. Due to its infancy, many definitions are independent, application-specific, mutually incomplete, redundant, or inconsistent, hence so is data science. This research addresses this data science multiple definitions challenge by proposing the development of coherent, unified definition based on a data science reference framework using a data science journal for the data science community to achieve such a definition. This paper provides candidate definitions for essential data science artifacts that are required to discuss such a definition. They are based on the classical research paradigm concept consisting of a philosophy of data science, the data science problem solving paradigm, and the six component data science reference framework (axiology, ontology, epistemology, methodology, methods, technology) that is a frequently called for unifying framework with which to define, unify, and evolve data science. It presents challenges for defining data science, solution approaches, i.e., means for defining data science, and their requirements and benefits as the basis of a comprehensive solution.
翻译:数据科学并非一门科学,而是一种研究范式。其力量、范围和规模将超越科学这一我们最强大的研究范式,从而助力知识发现并改变世界。我们尚未完全理解并定义它,而这对于实现其潜力、控制其风险至关重要。现代数据科学尚处萌芽阶段。自1962年起缓慢萌发,2000年后加速演进,它是一个全新的探究领域,亦是21世纪最活跃、强大且快速演变的创新之一。因其价值、力量与适用性,它已渗透至40多个学科、数百个研究领域及数千种应用。数百万篇数据科学出版物中包含了对数据科学及数据科学问题解决方法的无数定义。由于尚处发展初期,许多定义相互独立、局限于特定应用领域、互不完整、冗余或不一致,数据科学本身也随之呈现这种状态。本研究针对数据科学多重定义的挑战,提出基于数据科学参考框架构建连贯统一定义的方案,并利用数据科学期刊让该领域社群共同实现此定义。本文提供了讨论该定义所需的关键数据科学制品候选定义。这些定义基于经典研究范式概念,涵盖数据科学哲学、数据科学问题解决范式,以及包含六个组件(价值论、本体论、认识论、方法论、方法、技术)的数据科学参考框架——这是一个经常被呼吁的统一框架,用于定义、统一并发展数据科学。文章还阐述了定义数据科学面临的挑战、解决方案路径(即定义数据科学的手段)、及其作为综合方案基础所需的条件与益处。