The scarcity of realistic datasets poses a significant challenge in benchmarking recommender systems and social network analysis methods and techniques. A common and effective solution is to generate synthetic data that simulates realistic interactions. However, although various methods have been proposed, the existing literature still lacks generators that are fully adaptable and allow easy manipulation of the underlying data distributions and structural properties. To address this issue, the present work introduces GenRec, a novel framework for generating synthetic user-item interactions that exhibit realistic and well-known properties observed in recommendation scenarios. The framework is based on a stochastic generative process based on latent factor modeling. Here, the latent factors can be exploited to yield long-tailed preference distributions, and at the same time they characterize subpopulations of users and topic-based item clusters. Notably, the proposed framework is highly flexible and offers a wide range of hyper-parameters for customizing the generation of user-item interactions. The code used to perform the experiments is publicly available at https://anonymous.4open.science/r/GenRec-DED3.
翻译:现实数据集的稀缺性对推荐系统及社交网络分析方法的基准测试构成了显著挑战。一种常见且有效的解决方案是生成模拟真实交互的合成数据。然而,尽管已有多种方法被提出,现有文献仍缺乏完全可适应、且允许轻松操控底层数据分布与结构属性的生成器。为应对此问题,本研究提出了GenRec,一个用于生成合成用户-物品交互的新型框架,该框架能够呈现推荐场景中观察到的真实且广为人知的特性。该框架基于一种采用隐因子建模的随机生成过程。其中,隐因子可用于产生长尾偏好分布,同时它们也刻画了用户子群体与基于主题的物品聚类。值得注意的是,所提出的框架具有高度灵活性,并提供了广泛的超参数以定制用户-物品交互的生成过程。用于实验的代码公开于 https://anonymous.4open.science/r/GenRec-DED3。