The availability of representative datasets is an essential prerequisite for many successful artificial intelligence and machine learning models. However, in real life applications these models often encounter scenarios that are inadequately represented in the data used for training. There are various reasons for the absence of sufficient data, ranging from time and cost constraints to ethical considerations. As a consequence, the reliable usage of these models, especially in safety-critical applications, is still a tremendous challenge. Leveraging additional, already existing sources of knowledge is key to overcome the limitations of purely data-driven approaches. Knowledge augmented machine learning approaches offer the possibility of compensating for deficiencies, errors, or ambiguities in the data, thus increasing the generalization capability of the applied models. Even more, predictions that conform with knowledge are crucial for making trustworthy and safe decisions even in underrepresented scenarios. This work provides an overview of existing techniques and methods in the literature that combine data-driven models with existing knowledge. The identified approaches are structured according to the categories knowledge integration, extraction and conformity. In particular, we address the application of the presented methods in the field of autonomous driving.
翻译:代表性数据集的可用性是许多成功人工智能和机器学习模型的基本前提。然而,在实际应用中,这些模型常常遇到训练数据中未能充分覆盖的场景。由于时间、成本限制及伦理考量等多种原因,数据不足的情况时有发生。因此,这些模型特别是安全关键应用的可靠使用仍面临巨大挑战。利用额外且已有的知识源是克服纯数据驱动方法局限性的关键。知识增强机器学习方法能够补偿数据中的缺陷、错误或歧义,从而提升所应用模型的泛化能力。更重要的是,即使在代表性不足的场景中,符合知识的预测对于实现可信且安全的决策至关重要。本文概述了文献中结合数据驱动模型与现有知识的现有技术和方法。所识别的方法按照知识整合、提取和一致性三类进行结构化梳理。特别地,我们探讨了所述方法在自动驾驶领域的应用。