Efficient Cloud-edge Collaborative Approaches to SPARQL Queries over Large RDF graphs

With the increasing use of RDF graphs, storing and querying such data using SPARQL remains a critical problem. Current mainstream solutions rely on cloud-based data management architectures, but often suffer from performance bottlenecks in environments with limited bandwidth or high system load. To address this issue, this paper explores for the first time the integration of edge computing to move graph data storage and processing to edge environments, thereby improving query performance. This approach requires offloading query processing to edge servers, which involves addressing two challenges: data localization and network scheduling. First, the data localization challenge lies in computing the subgraphs maintained on edge servers to quickly identify the servers that can handle specific queries. To address this challenge, we introduce a new concept of pattern-induced subgraphs. Second, the network scheduling challenge involves efficiently assigning queries to edge and cloud servers to optimize overall system performance. We tackle this by constructing a overall system model that jointly captures data distribution, query characteristics, network communication, and computational resources. Accordingly, we further propose a joint formulation of query assignment and computational resource allocation, modeling it as a Mixed Integer Nonlinear Programming (MINLP) problem and solve this problem using a modified branch-and-bound algorithm. Experimental results on real datasets under a real cloud platform demonstrate that our proposed method outperforms the state-of-the-art baseline methods in terms of efficiency. The codes are available on GitHub

翻译：随着RDF图的日益广泛应用，基于SPARQL的数据存储与查询仍是关键难题。当前主流解决方案依赖云端数据管理架构，但在带宽受限或系统负载较高的环境中常面临性能瓶颈。为应对这一问题，本文首次探索通过边缘计算集成将图数据存储与处理迁移至边缘环境，从而提升查询性能。该方法需将查询处理卸载至边缘服务器，这涉及两个核心挑战：数据本地化与网络调度。首先，数据本地化挑战在于计算边缘服务器维护的子图，以快速识别能够处理特定查询的服务器。为此，我们引入了模式诱导子图的新概念。其次，网络调度挑战涉及如何高效地将查询分配至边缘与云端服务器以优化整体系统性能。我们通过构建一个统一系统模型来解决该问题，该模型同时涵盖数据分布、查询特征、网络通信与计算资源。基于此，我们进一步提出查询分配与计算资源分配的联合建模方案，将其表述为混合整数非线性规划问题，并采用改进的分支定界算法进行求解。在真实云平台环境下基于实际数据集的实验表明，所提方法在效率方面优于当前最先进的基准方法。相关代码已在GitHub开源。