L-WISE: Boosting Human Image Category Learning Through Model-Based Image Selection And Enhancement

The currently leading artificial neural network (ANN) models of the visual ventral stream -- which are derived from a combination of performance optimization and robustification methods -- have demonstrated a remarkable degree of behavioral alignment with humans on visual categorization tasks. Extending upon previous work, we show that not only can these models guide image perturbations that change the induced human category percepts, but they also can enhance human ability to accurately report the original ground truth. Furthermore, we find that the same models can also be used out-of-the-box to predict the proportion of correct human responses to individual images, providing a simple, human-aligned estimator of the relative difficulty of each image. Motivated by these observations, we propose to augment visual learning in humans in a way that improves human categorization accuracy at test time. Our learning augmentation approach consists of (i) selecting images based on their model-estimated recognition difficulty, and (ii) using image perturbations that aid recognition for novice learners. We find that combining these model-based strategies gives rise to test-time categorization accuracy gains of 33-72% relative to control subjects without these interventions, despite using the same number of training feedback trials. Surprisingly, beyond the accuracy gain, the training time for the augmented learning group was also shorter by 20-23%. We demonstrate the efficacy of our approach in a fine-grained categorization task with natural images, as well as tasks in two clinically relevant image domains -- histology and dermoscopy -- where visual learning is notoriously challenging. To the best of our knowledge, this is the first application of ANNs to increase visual learning performance in humans by enhancing category-specific features.

翻译：目前领先的视觉腹侧通路人工神经网络模型——这些模型源自性能优化与鲁棒化方法的结合——已在视觉分类任务中展现出与人类行为的高度一致性。在先前研究基础上，我们证明这些模型不仅能指导改变人类类别感知的图像扰动，还能提升人类准确报告原始真实类别的能力。此外，我们发现这些模型无需调整即可预测人类对单张图像的正确反应比例，为每张图像的相对难度提供了简单且与人类对齐的估计器。基于这些发现，我们提出增强人类视觉学习的方法，以提升测试阶段的人类分类准确率。我们的学习增强方法包括：（i）根据模型估计的识别难度选择图像；（ii）对初学者使用有助于识别的图像扰动。研究发现，结合这些基于模型的策略可使测试分类准确率相对未干预对照组提升33-72%，尽管使用的训练反馈试验次数相同。令人惊讶的是，除准确率提升外，增强学习组的训练时间也缩短了20-23%。我们在自然图像的细粒度分类任务，以及两个具有临床意义的图像领域——组织病理学和皮肤镜影像——中验证了方法的有效性，这些领域的视觉学习素来具有挑战性。据我们所知，这是首次通过增强类别特异性特征将人工神经网络应用于提升人类视觉学习性能的研究。