In this research paper, we introduce a novel approach designed for the purpose of segmenting the layout of Bangla documents. Our methodology involves the utilization of a sophisticated ensemble of YOLOv8 models, which were trained for the DL Sprint 2.0 - BUET CSE Fest 2023 Competition focused on Bangla document layout segmentation. Our primary emphasis lies in enhancing various aspects of the task, including techniques such as image augmentation, model architecture, and the incorporation of model ensembles. We deliberately reduce the quality of a subset of document images to enhance the resilience of model training, thereby resulting in an improvement in our cross-validation score. By employing Bayesian optimization, we determine the optimal confidence and Intersection over Union (IoU) thresholds for our model ensemble. Through our approach, we successfully demonstrate the effectiveness of anchor-free models in achieving robust layout segmentation in Bangla documents.
翻译:本论文提出一种专为孟加拉文档布局分割设计的新方法。该方法利用YOLOv8模型的复杂集成,这些模型针对专注于孟加拉文档布局分割的DL Sprint 2.0 - BUET CSE Fest 2023竞赛进行了训练。我们的核心重点在于提升任务的多方面性能,包括图像增强、模型架构以及模型集成等技术。我们特意降低部分文档图像的质量以增强模型训练的鲁棒性,从而提升交叉验证分数。通过采用贝叶斯优化,我们确定了模型集成的置信度与交并比最优阈值。实验结果表明,无锚框模型在实现孟加拉文档稳健布局分割中具有显著有效性。