Data collection is pervasively bound to our digital lifestyle. A recent study by the IDC reports that the growth of the data created and replicated in 2020 was even higher than in the previous years due to pandemic-related confinements to an astonishing global amount of 64.2 zettabytes of data. While not all the produced data is meant to be analyzed, there are numerous companies whose services/products rely heavily on data analysis. That is to say that mining the produced data has already revealed great value for businesses in different sectors. But to be able to fully realize this value, companies need to be able to hire professionals that are capable of gleaning insights and extracting value from the available data. We hypothesize that people nowadays conducting data-science-related tasks in practice may not have adequate training or formation. So in order to be able to fully support them in being productive in their duties, e.g. by building appropriate tools that increase their productivity, we first need to characterize the current generation of data scientists. To contribute towards this characterization, we conducted a public survey to fully understand who is doing data science, how they work, what are the skills they hold and lack, and which tools they use and need.
翻译:数据收集已普遍融入我们的数字生活方式。IDC近期研究报告显示,由于疫情相关的封闭措施,2020年全球数据创建与复制量增长至惊人的64.2泽字节,增速甚至超过往年。虽然并非所有生成数据都需要分析,但众多企业的服务/产品高度依赖数据分析。这意味着挖掘生成数据已为不同行业的企业带来巨大价值。然而要充分实现这种价值,企业需要聘用能够从可用数据中提取洞见与价值的专业人才。我们假设当前实践中从事数据科学相关任务的人员可能未接受充分培训或系统培养。因此,为全面支持其工作效能提升(例如通过开发提高生产力的工具),首先需要刻画当前数据科学家群体的特征。为此我们开展了一项公开调查,旨在全面理解数据科学的实践者构成、工作方式、现有技能与缺失能力,以及他们使用与需要的工具。