Crowdsourcing allows running simple human intelligence tasks on a large crowd of workers, enabling solving problems for which it is difficult to formulate an algorithm or train a machine learning model in reasonable time. One of such problems is data clustering by an under-specified criterion that is simple for humans, but difficult for machines. In this demonstration paper, we build a crowdsourced system for image clustering and release its code under a free license at https://github.com/Toloka/crowdclustering. Our experiments on two different image datasets, dresses from Zalando's FEIDEGGER and shoes from the Toloka Shoes Dataset, confirm that one can yield meaningful clusters with no machine learning algorithms purely with crowdsourcing.
翻译:众包允许在大量工作者上运行简单的人类智能任务,从而能够解决那些难以在合理时间内制定算法或训练机器学习模型的问题。其中一个此类问题便是根据一个对人类简单但对机器困难的欠指定标准进行数据聚类。在这篇演示论文中,我们构建了一个用于图像聚类的众包系统,并在 https://github.com/Toloka/crowdclustering 处以自由许可证发布其代码。我们在两个不同的图像数据集上进行的实验——来自 Zalando 的 FEIDEGGER 连衣裙和来自 Toloka Shoes 数据集的鞋子——证实了仅凭众包就能产生有意义的聚类结果,而无需任何机器学习算法。