Interactive machine learning (IML) allows users to build their custom machine learning models without expert knowledge. While most existing IML systems are designed with classification algorithms, they sometimes oversimplify the capabilities of machine learning algorithms and restrict the user's task definition. On the other hand, as recent large-scale language models have shown, natural language representation has the potential to enable more flexible and generic task descriptions. Models that take images as input and output text have the potential to represent a variety of tasks by providing appropriate text labels for training. However, the effect of introducing text labels to IML system design has never been investigated. In this work, we aim to investigate the difference between image-to-text translation and image classification for IML systems. Using our prototype systems, we conducted a comparative user study with non-expert users, where participants solved various tasks. Our results demonstrate the underlying difficulty for users in properly defining image recognition tasks while highlighting the potential and challenges of interactive image-to-text translation systems.
翻译:交互式机器学习(IML)允许用户在无需专家知识的情况下构建自定义机器学习模型。虽然现有大多数IML系统基于分类算法设计,但它们有时会过度简化机器学习算法的能力,并限制用户的任务定义。另一方面,正如近期大规模语言模型所展示的,自然语言表示具有实现更灵活、更通用任务描述的潜力。以图像为输入、输出文本的模型能够通过提供合适的文本标签进行训练,从而表示多种任务。然而,将文本标签引入IML系统设计的效果此前从未被探究。在本工作中,我们旨在研究图像到文本翻译与图像分类在IML系统中的差异。利用我们的原型系统,我们与非专家用户进行了一项比较用户研究,参与者需解决各类任务。我们的结果表明,用户在正确定义图像识别任务时存在潜在困难,同时凸显了交互式图像到文本翻译系统的潜力与挑战。