One prerequisite for supervised machine learning is high quality labelled data. Acquiring such data is, particularly if expert knowledge is required, costly or even impossible if the task needs to be performed by a single expert. In this paper, we illustrate tool support that we adopted and extended to source domain-specific knowledge from experts. We provide insight in design decisions that aim at motivating experts to dedicate their time at performing the labelling task. We are currently using the approach to identify true synonyms from a list of candidate synonyms. The identification of synonyms is important in scenarios were stakeholders from different companies and background need to collaborate, for example when defining and negotiating requirements. We foresee that the approach of expert-sourcing is applicable to any data labelling task in software engineering. The discussed design decisions and implementation are an initial draft that can be extended, refined and validated with further application.
翻译:监督式机器学习的前提条件之一是高质量标注数据。若需专家知识,获取此类数据成本高昂,甚至可能无法实现(当任务须由单个专家完成时)。本文阐述了我们采用并扩展的用于从专家那里获取领域知识的工具支持。我们深入探讨了旨在激励专家投入时间完成标注任务的设计决策。目前,我们正运用该方法从候选同义词列表中识别真实同义词。同义词识别在来自不同公司和背景的利益相关者需要协作的场景中(例如定义和协商需求时)具有重要意义。我们预见,专家众包方法可适用于软件工程中的任何数据标注任务。本文讨论的设计决策与实现方案是初步框架,可通过进一步应用进行扩展、完善与验证。