Personalized book recommendation in Bangla literature has been constrained by the lack of structured, large-scale, and publicly available datasets. This work introduces RokomariBG, a large-scale heterogeneous book graph dataset designed to support research on personalized recommendation in a low-resource language setting. The dataset comprises 127,302 books, 63,723 users, 16,601 authors, 1,515 categories, 2,757 publishers, and 209,602 reviews, connected through several relation types and organized as a comprehensive knowledge graph. To demonstrate the utility of the dataset, we present a systematic benchmarking study on the top-N recommendation and sequential recommendation tasks, evaluating a diverse set of representative recommendation models. Through comprehensive benchmarking, we demonstrate that recommendation performance in this domain is strongly influenced by both heterogeneous relational information and code-mixed textual metadata. These findings reveal unique challenges of Bangladeshi e-commerce ecosystems that are largely absent from existing recommendation benchmarks. Overall, this work establishes a foundational benchmark and a publicly available resource for Bangla book recommendation research, enabling reproducible evaluation and future studies on recommendation in low-resource cultural domains. The dataset and code are publicly available at https://github.com/backlashblitz/Bangla-Book-Recommendation-Dataset
翻译:孟加拉语文学领域的个性化图书推荐受限于缺乏结构化、大规模且公开可用的数据集。本文提出了RokomariBG——一个大规模异构图书图谱数据集,旨在支持低资源语言环境下的个性化推荐研究。该数据集包含127,302本书籍、63,723名用户、16,601位作者、1,515个类别、2,757家出版社以及209,602条评论,通过多种关系类型连接,并组织为一张综合知识图谱。为展示该数据集的实用性,我们针对Top-N推荐和序列推荐任务进行了系统化的基准研究,评估了多种有代表性的推荐模型。通过全面的基准测试,我们发现该领域推荐性能同时受异构关系信息和代码混合文本元数据的显著影响。这些发现揭示了孟加拉国电子商务生态系统中独特的挑战,而这些挑战在现有推荐基准中基本不存在。总体而言,本研究为孟加拉语图书推荐研究建立了基础基准和公开可用资源,实现了可复现的评估以及未来低资源文化领域推荐研究。数据集与代码已在https://github.com/backlashblitz/Bangla-Book-Recommendation-Dataset 公开提供。