This paper investigates video game identification through single screenshots, utilizing five convolutional neural network (CNN) architectures (MobileNet, DenseNet, EfficientNetB0, EfficientNetB2, and EfficientNetB3) across 22 home console systems, spanning from Atari 2600 to PlayStation 5. Confirming the hypothesis, CNNs autonomously extract image features, enabling the identification of game titles from screenshots without additional features. Using ImageNet pre-trained weights, EfficientNetB3 achieves the highest average accuracy (74.51%), while DenseNet169 excels in 14 of the 22 systems. Employing alternative initial weights from another screenshots dataset boosts accuracy for EfficientNetB2 and EfficientNetB3, with the latter reaching a peak accuracy of 76.36% and demonstrating reduced convergence epochs from 23.7 to 20.5 on average. Overall, the combination of optimal architecture and weights attains 77.67% accuracy, primarily led by EfficientNetB3 in 19 systems. These findings underscore the efficacy of CNNs in video game identification through screenshots.
翻译:本文研究了通过单张截图进行视频游戏识别的方法,利用五种卷积神经网络架构(MobileNet、DenseNet、EfficientNetB0、EfficientNetB2和EfficientNetB3),覆盖从Atari 2600到PlayStation 5的22个家用主机系统。验证了假设:卷积神经网络可自主提取图像特征,无需额外特征即可从截图识别游戏标题。使用ImageNet预训练权重时,EfficientNetB3取得最高平均准确率(74.51%),而DenseNet169在22个系统中的14个中表现最优。采用来自另一个截图数据集的替代初始权重后,EfficientNetB2和EfficientNetB3的准确率得到提升,后者达到峰值准确率76.36%,且平均收敛轮次从23.7降至20.5。总体而言,最优架构与权重组合实现了77.67%的准确率,其中EfficientNetB3在19个系统中主导了性能提升。这些发现证明了卷积神经网络在通过截图识别视频游戏中的有效性。