TCuPGAN: A novel framework developed for optimizing human-machine interactions in citizen science

Ramanakumar Sankar,Kameswara Mantha,Lucy Fortson,Helen Spiers,Thomas Pengo,Douglas Mashek,Myat Mo,Mark Sanders,Trace Christensen,Jeffrey Salisbury,Laura Trouille

from arxiv, 5 pages, 1 figure, accepted for publication at HLDM '23 (ECML PKDD 2023 workshop)

In the era of big data in scientific research, there is a necessity to leverage techniques which reduce human effort in labeling and categorizing large datasets by involving sophisticated machine tools. To combat this problem, we present a novel, general purpose model for 3D segmentation that leverages patch-wise adversariality and Long Short-Term Memory to encode sequential information. Using this model alongside citizen science projects which use 3D datasets (image cubes) on the Zooniverse platforms, we propose an iterative human-machine optimization framework where only a fraction of the 2D slices from these cubes are seen by the volunteers. We leverage the patch-wise discriminator in our model to provide an estimate of which slices within these image cubes have poorly generalized feature representations, and correspondingly poor machine performance. These images with corresponding machine proposals would be presented to volunteers on Zooniverse for correction, leading to a drastic reduction in the volunteer effort on citizen science projects. We trained our model on ~2300 liver tissue 3D electron micrographs. Lipid droplets were segmented within these images through human annotation via the `Etch A Cell - Fat Checker' citizen science project, hosted on the Zooniverse platform. In this work, we demonstrate this framework and the selection methodology which resulted in a measured reduction in volunteer effort by more than 60%. We envision this type of joint human-machine partnership will be of great use on future Zooniverse projects.

翻译：在科学研究的大数据时代，有必要利用复杂机器工具来减少人工标注和分类大型数据集的工作量。为解决这一问题，我们提出了一种新颖的通用三维分割模型，该模型利用分块对抗性与长短期记忆网络编码序列信息。结合使用Zooniverse平台上三维数据集（图像立方体）的公民科学项目，我们提出了一种迭代式人机优化框架：仅需让志愿者查看这些立方体中一小部分二维切片。我们利用模型中的分块判别器，评估图像立方体中哪些切片的特征表示泛化能力较差，进而导致机器性能低下。这些图像及其对应的机器提案将被呈现给Zooniverse上的志愿者进行修正，从而显著减少公民科学项目中的志愿者工作量。我们在约2300张肝脏组织三维电子显微图像上训练了模型，通过Zooniverse平台上的“Etch A Cell - Fat Checker”公民科学项目，依靠人工标注实现了脂滴分割。本研究展示了这一框架及选择方法，结果表明志愿者工作量减少了60%以上。我们预期这种人机联合协作模式将在未来的Zooniverse项目中发挥重要作用。