Personalized book recommendation in Bangla literature has been constrained by the lack of structured, large-scale, and publicly available datasets. This work introduces RokomariBG, a large-scale, multi-entity heterogeneous book graph dataset designed to support research on personalized recommendation in a low-resource language setting. The dataset comprises 127,302 books, 63,723 users, 16,601 authors, 1,515 categories, 2,757 publishers, and 209,602 reviews, connected through eight relation types and organized as a comprehensive knowledge graph. To demonstrate the utility of the dataset, we provide a systematic benchmarking study on the Top-N recommendation task, evaluating a diverse set of representative recommendation models, including classical collaborative filtering methods, matrix factorization models, content-based approaches, graph neural networks, a hybrid matrix factorization model with side information, and a neural two-tower retrieval architecture. The benchmarking results highlight the importance of leveraging multi-relational structure and textual side information, with neural retrieval models achieving the strongest performance (NDCG@10 = 0.204). Overall, this work establishes a foundational benchmark and a publicly available resource for Bangla book recommendation research, enabling reproducible evaluation and future studies on recommendation in low-resource cultural domains. The dataset and code are publicly available at https://github.com/backlashblitz/Bangla-Book-Recommendation-Dataset
翻译:孟加拉语文学的个性化图书推荐一直受限于缺乏结构化、大规模且公开可用的数据集。本研究介绍了RokomariBG,一个大规模、多实体异构图图书数据集,旨在支持低资源语言环境下个性化推荐的研究。该数据集包含127,302本图书、63,723名用户、16,601位作者、1,515个类别、2,757家出版社以及209,602条评论,通过八种关系类型相互连接,并组织成一个综合知识图谱。为展示该数据集的实用性,我们在Top-N推荐任务上进行了系统性基准测试,评估了多种代表性推荐模型,包括经典协同过滤方法、矩阵分解模型、基于内容的方法、图神经网络、结合辅助信息的混合矩阵分解模型以及神经双塔检索架构。基准测试结果凸显了利用多关系结构和文本辅助信息的重要性,其中神经检索模型取得了最佳性能(NDCG@10 = 0.204)。总体而言,本研究为孟加拉语图书推荐研究建立了基础性基准和公开可用的资源,支持低资源文化领域推荐系统的可复现评估与未来研究。数据集与代码已公开于https://github.com/backlashblitz/Bangla-Book-Recommendation-Dataset