Software that contains machine learning algorithms is an integral part of automotive perception, for example, in driving automation systems. The development of such software, specifically the training and validation of the machine learning components, require large annotated datasets. An industry of data and annotation services has emerged to serve the development of such data-intensive automotive software components. Wide-spread difficulties to specify data and annotation needs challenge collaborations between OEMs (Original Equipment Manufacturers) and their suppliers of software components, data, and annotations. This paper investigates the reasons for these difficulties for practitioners in the Swedish automotive industry to arrive at clear specifications for data and annotations. The results from an interview study show that a lack of effective metrics for data quality aspects, ambiguities in the way of working, unclear definitions of annotation quality, and deficits in the business ecosystems are causes for the difficulty in deriving the specifications. We provide a list of recommendations that can mitigate challenges when deriving specifications and we propose future research opportunities to overcome these challenges. Our work contributes towards the on-going research on accountability of machine learning as applied to complex software systems, especially for high-stake applications such as automated driving.
翻译:包含机器学习算法的软件是汽车感知系统不可或缺的组成部分,例如在驾驶自动化系统中。此类软件的开发,特别是机器学习组件的训练与验证,需要大规模标注数据集。为满足数据密集型汽车软件组件的开发需求,数据与标注服务行业应运而生。然而,在数据与标注需求的规范化方面普遍存在的困难,给原始设备制造商(OEM)及其软件组件、数据与标注供应商之间的协作带来了挑战。本文以瑞典汽车行业从业者为对象,探究了其在制定明确数据与标注规范时面临困难的根源。访谈研究结果表明:有效数据质量度量指标的缺乏、工作方式的模糊性、标注质量定义的不明确以及商业生态系统中的缺陷,是导致规范制定困难的主要原因。我们提出了一系列缓解规范制定难题的建议,并指出了克服这些挑战的未来研究方向。本研究有助于推进对复杂软件系统中机器学习可问责性的持续探究,尤其适用于自动驾驶等高利害应用场景。