We introduce the Burmese Handwritten Digit Dataset (BHDD), a collection of 87,561 grayscale images of handwritten Burmese digits in ten classes. Each image is 28x28 pixels, following the MNIST format. The training set has 60,000 samples split evenly across classes; the test set has 27,561 samples with class frequencies as they arose during collection. Over 150 people of different ages and backgrounds contributed samples. We analyze the dataset's class distribution, pixel statistics, and morphological variation, and identify digit pairs that are easily confused due to the round shapes of the Myanmar script. Simple baselines (an MLP, a two-layer CNN, and an improved CNN with batch normalization and augmentation) reach 99.40%, 99.75%, and 99.83% test accuracy respectively. BHDD is available under CC BY-SA 4.0 at https://github.com/baseresearch/BHDD
翻译:我们介绍了缅甸手写数字数据集(BHDD),该数据集包含87,561幅十类缅甸手写数字的灰度图像。每幅图像为28×28像素,遵循MNIST格式。训练集包含60,000个样本,各类别均匀分布;测试集包含27,561个样本,其类别频率与数据采集时的自然分布一致。超过150名不同年龄和背景的人员提供了样本。我们分析了数据集的类别分布、像素统计和形态变异,并识别了因缅甸文字圆形形状而容易混淆的数字对。简单基线模型(MLP、两层CNN以及带有批归一化和数据增强的改进型CNN)的测试准确率分别达到了99.40%、99.75%和99.83%。BHDD采用CC BY-SA 4.0许可协议,可在https://github.com/baseresearch/BHDD获取。