This paper investigates video game identification through single screenshots, utilizing five convolutional neural network (CNN) architectures (MobileNet, DenseNet, EfficientNetB0, EfficientNetB2, and EfficientNetB3) across 22 home console systems, spanning from Atari 2600 to PlayStation 5, totalling 8,796 games and 170,881 screenshots. Confirming the hypothesis, CNNs autonomously extract image features, enabling the identification of game titles from screenshots without additional features. Using ImageNet pre-trained weights as initial weights, EfficientNetB3 achieves the highest average accuracy (74.51%), while DenseNet169 excels in 14 of the 22 systems. Employing alternative initial weights trained in an arcade screenshots dataset boosts accuracy for EfficientNetB2 and EfficientNetB3, with the latter reaching a peak accuracy of 76.36% and demonstrating reduced convergence epochs from 23.7 to 20.5 on average. Overall, the combination of optimal architecture and weights attains 77.67% accuracy, primarily led by EfficientNetB3 in 19 systems. These findings underscore the efficacy of CNNs in video game identification through screenshots.
翻译:本文研究通过单张截图进行视频游戏识别,采用五种卷积神经网络架构(MobileNet、DenseNet、EfficientNetB0、EfficientNetB2和EfficientNetB3),涵盖从Atari 2600到PlayStation 5的22个家用主机系统,总计8796款游戏和170881张截图。验证了假设:卷积神经网络能自主提取图像特征,实现无需额外特征即可从截图中识别游戏标题。使用ImageNet预训练权重作为初始权重时,EfficientNetB3取得最高平均准确率(74.51%),而DenseNet169在22个系统中的14个上表现最优。采用街机截图数据集训练的替代初始权重后,EfficientNetB2和EfficientNetB3的准确率得到提升,后者达到峰值准确率76.36%,且平均收敛轮次从23.7降至20.5。总体而言,最优架构与权重的组合实现了77.67%的准确率,其中EfficientNetB3在19个系统中占据主导地位。这些发现凸显了卷积神经网络在通过截图进行视频游戏识别中的有效性。