Machine Learning and Deep Learning have achieved an impressive standard today, enabling us to answer questions that were inconceivable a few years ago. Besides these successes, it becomes clear, that beyond pure prediction, which is the primary strength of most supervised machine learning algorithms, the quantification of uncertainty is relevant and necessary as well. While first concepts and ideas in this direction have emerged in recent years, this paper adopts a conceptual perspective and examines possible sources of uncertainty. By adopting the viewpoint of a statistician, we discuss the concepts of aleatoric and epistemic uncertainty, which are more commonly associated with machine learning. The paper aims to formalize the two types of uncertainty and demonstrates that sources of uncertainty are miscellaneous and can not always be decomposed into aleatoric and epistemic. Drawing parallels between statistical concepts and uncertainty in machine learning, we also demonstrate the role of data and their influence on uncertainty.
翻译:机器学习和深度学习如今已达到令人瞩目的水平,使我们能够回答几年前难以想象的问题。除了这些成功之外,人们认识到,超越纯预测(大多数监督机器学习算法的主要优势)之外,不确定性量化同样具有相关性和必要性。尽管近年来这一方向已有初步概念和想法,但本文采用概念性视角,审视了不确定性的可能来源。通过采纳统计学家的观点,我们讨论了更常与机器学习关联的偶然不确定性和认知不确定性的概念。本文旨在形式化这两种不确定性类型,并表明不确定性来源多种多样,并非总能分解为偶然性和认知性。通过将统计概念与机器学习中的不确定性进行类比,我们还展示了数据的作用及其对不确定性的影响。