Assigning labels to instances is crucial for supervised machine learning. In this paper, we proposed a novel annotation method called Q&A labeling, which involves a question generator that asks questions about the labels of the instances to be assigned, and an annotator who answers the questions and assigns the corresponding labels to the instances. We derived a generative model of labels assigned according to two different Q&A labeling procedures that differ in the way questions are asked and answered. We showed that, in both procedures, the derived model is partially consistent with that assumed in previous studies. The main distinction of this study from previous studies lies in the fact that the label generative model was not assumed, but rather derived based on the definition of a specific annotation method, Q&A labeling. We also derived a loss function to evaluate the classification risk of ordinary supervised machine learning using instances assigned Q&A labels and evaluated the upper bound of the classification error. The results indicate statistical consistency in learning with Q&A labels.
翻译:将标签分配给实例对于监督式机器学习至关重要。本文提出了一种名为问答标注的新型标注方法,该方法包含一个用于询问待分配实例标签相关问题的生成器,以及一个解答问题并据此为实例分配对应标签的标注者。我们推导了根据两种不同问答标注流程(问题提问与回答方式存在差异)所分配标签的生成模型。研究发现,在两种流程中,推导出的模型与先前研究的部分假设一致。本研究与以往研究的主要区别在于:标签生成模型并非基于假设,而是根据特定标注方法(问答标注)的定义推导得出。我们还推导了用于评估采用问答标签的实例进行普通监督式机器学习分类风险的损失函数,并评估了分类误差的上界。结果表明,使用问答标签进行学习具有统计一致性。