Probabilistic relational models provide a well-established formalism to combine first-order logic and probabilistic models, thereby allowing to represent relationships between objects in a relational domain. At the same time, the field of artificial intelligence requires increasingly large amounts of relational training data for various machine learning tasks. Collecting real-world data, however, is often challenging due to privacy concerns, data protection regulations, high costs, and so on. To mitigate these challenges, the generation of synthetic data is a promising approach. In this paper, we solve the problem of generating synthetic relational data via probabilistic relational models. In particular, we propose a fully-fledged pipeline to go from relational database to probabilistic relational model, which can then be used to sample new synthetic relational data points from its underlying probability distribution. As part of our proposed pipeline, we introduce a learning algorithm to construct a probabilistic relational model from a given relational database.
翻译:概率关系模型为结合一阶逻辑与概率模型提供了成熟的形式化框架,从而能够表示关系领域中对象间的关联关系。与此同时,人工智能领域对各类机器学习任务所需关系型训练数据的需求日益增长。然而,由于隐私问题、数据保护法规、高昂成本等因素,真实世界数据的收集往往面临挑战。为应对这些挑战,合成数据生成成为一种前景广阔的方法。本文通过概率关系模型解决了关系型合成数据的生成问题。具体而言,我们提出了一套完整的处理流程:从关系数据库出发构建概率关系模型,进而基于其底层概率分布采样生成新的关系型合成数据点。作为流程的核心环节,我们提出了一种从给定关系数据库自动构建概率关系模型的学习算法。