Class imbalance is a common and pernicious issue for the training of neural networks. Often, an imbalanced majority class can dominate training to skew classifier performance towards the majority outcome. To address this problem we introduce cardinality augmented loss functions, derived from cardinality-like invariants in modern mathematics literature such as magnitude and the spread. These invariants enrich the concept of cardinality by evaluating the `effective diversity' of a metric space, and as such represent a natural solution to overly homogeneous training data. In this work, we establish a methodology for applying cardinality augmented loss functions in the training of neural networks and report results on both artificially imbalanced datasets as well as a real-world imbalanced material science dataset. We observe significant performance improvement among minority classes, as well as improvement in overall performance metrics.
翻译:类别不平衡是神经网络训练中普遍存在且具有危害性的问题。通常,不平衡的多数类会主导训练过程,导致分类器性能偏向多数类结果。为解决此问题,我们引入了基数增强损失函数,其源自现代数学文献中的基数类不变量,如量级(magnitude)与展度(spread)。这些不变量通过评估度量空间的“有效多样性”来丰富基数的概念,因此天然适用于解决训练数据过度同质化的问题。本研究建立了在神经网络训练中应用基数增强损失函数的方法论,并在人工构造的不平衡数据集以及真实世界的不平衡材料科学数据集上报告了实验结果。我们观察到少数类性能的显著提升,以及整体性能指标的改善。