Performing classification on noisy, crowdsourced image datasets can prove challenging even for the best neural networks. Two issues which complicate the problem on such datasets are class imbalance and ground-truth uncertainty in labeling. The AL-ALL and AL-PUB datasets - consisting of tightly cropped, individual characters from images of ancient Greek papyri - are strongly affected by both issues. The application of ensemble modeling to such datasets can help identify images where the ground-truth is questionable and quantify the trustworthiness of those samples. As such, we apply stacked generalization consisting of nearly identical ResNets with different loss functions: one utilizing sparse cross-entropy (CXE) and the other Kullback-Liebler Divergence (KLD). Both networks use labels drawn from a crowd-sourced consensus. This consensus is derived from a Normalized Distribution of Annotations (NDA) based on all annotations for a given character in the dataset. For the second network, the KLD is calculated with respect to the NDA. For our ensemble model, we apply a k-nearest neighbors model to the outputs of the CXE and KLD networks. Individually, the ResNet models have approximately 93% accuracy, while the ensemble model achieves an accuracy of > 95%, increasing the classification trustworthiness. We also perform an analysis of the Shannon entropy of the various models' output distributions to measure classification uncertainty. Our results suggest that entropy is useful for predicting model misclassifications.
翻译:即使在最优秀的神经网络中,对存在噪声的众包图像数据集进行分类也极具挑战性。此类数据集面临的两个复杂问题是类别不平衡以及标注中的真实标签不确定性。AL-ALL和AL-PUB数据集(包含来自古希腊纸莎草文献图像的紧密裁剪的单个字符)受到这两个问题的严重影响。将集成建模应用于此类数据集有助于识别真实标签存疑的图像,并量化这些样本的可信度。为此,我们采用堆叠泛化方法,结合了几乎相同的ResNet模型,但使用不同的损失函数:一个采用稀疏交叉熵(CXE),另一个采用Kullback-Leibler散度(KLD)。两个网络均使用基于众包共识的标签。该共识来源于基于数据集中给定字符所有注释的注释归一化分布(NDA)。对于第二个网络,KLD是相对于NDA计算的。在我们的集成模型中,我们将k近邻模型应用于CXE和KLD网络的输出。单独来看,ResNet模型的准确率约为93%,而集成模型的准确率超过95%,从而提高了分类可信度。我们还对不同模型输出分布的香农熵进行了分析,以衡量分类不确定性。结果表明,熵对于预测模型误分类具有实用价值。