Data Science is a complex and evolving field, but most agree that it can be defined as a combination of expertise drawn from three broad areascomputer science and technology, math and statistics, and domain knowledge -- with the purpose of extracting knowledge and value from data. Beyond this, the field is often defined as a series of practical activities ranging from the cleaning and wrangling of data, to its analysis and use to infer models, to the visual and rhetorical representation of results to stakeholders and decision-makers. This essay proposes a model of data science that goes beyond laundry-list definitions to get at the specific nature of data science and help distinguish it from adjacent fields such as computer science and statistics. We define data science as an interdisciplinary field comprising four broad areas of expertise: value, design, systems, and analytics. A fifth area, practice, integrates the other four in specific contexts of domain knowledge. We call this the 4+1 model of data science. Together, these areas belong to every data science project, even if they are often unconnected and siloed in the academy.
翻译:数据科学是一个复杂且不断发展的领域,但大多数人认为其可定义为三个广泛领域知识(计算机科学与技术、数学与统计学、领域知识)的结合,目标是提取数据中的知识与价值。除此之外,该领域常被界定为一系列实践活动,涵盖数据清洗与整理、分析与建模推断,以及向利益相关者和决策者展示结果的可视化与修辞表达。本文提出一种超越罗列式定义的数据科学模型,旨在揭示数据科学的独特本质,并帮助其与计算机科学、统计学等相邻领域区分开来。我们将数据科学定义为一个跨学科领域,包含四大专业知识领域:价值、设计、系统与分析。第五个领域——实践——在特定领域知识背景下整合其他四个领域。我们称此为数据科学的4+1模型。这些领域共同构成每个数据科学项目的组成部分,尽管在学术界它们往往彼此割裂且独立存在。