With the increasing use of RDF graphs, storing and querying such data using SPARQL remains a critical problem. Current mainstream solutions rely on cloud-based data management architectures, but often suffer from performance bottle- necks in environments with limited bandwidth or high system load. To address this issue, this paper explores for the first time the integration of edge computing to move graph data storage and processing to edge environments, thereby improving query performance. This approach requires offloading query processing to edge servers, which involves addressing two challenges: data localization and network scheduling. First, the data localization challenge lies in computing the subgraphs maintained on edge servers to quickly identify the servers that can handle specific queries. To address this challenge, we introduce a new concept of pattern-induced subgraphs. Second, the network scheduling challenge involves efficiently assigning queries to edge and cloud servers to optimize overall system performance. We tackle this by constructing a overall system model that jointly captures data distribution, query characteristics, network communication, and computational resources. Accordingly, we further propose a joint formulation of query assignment and computational resource allocation, modeling it as a Mixed Integer Nonlinear Programming (MINLP) problem and solve this problem using a modified branch-and-bound algorithm. Experimental results on real datasets under a real cloud platform demonstrate that our proposed method outperforms the state-of-the-art baseline methods in terms of efficiency. The codes are available on GitHub
翻译:随着RDF图的广泛应用,基于SPARQL的数据存储与查询仍是关键难题。当前主流解决方案依赖云端数据管理架构,但在带宽受限或系统负载较高的环境中常面临性能瓶颈。为应对此问题,本文首次探索通过集成边缘计算将图数据存储与处理迁移至边缘环境,从而提升查询性能。该方法需将查询处理卸载至边缘服务器,这涉及两个核心挑战:数据本地化与网络调度。首先,数据本地化挑战在于计算边缘服务器维护的子图,以快速识别可处理特定查询的服务节点。为此,我们提出了模式诱导子图的新概念。其次,网络调度挑战涉及如何高效分配查询任务至边缘与云端服务器,以优化系统整体性能。我们通过构建一个综合数据分布、查询特征、网络通信与计算资源的系统模型来解决该问题。基于此,我们进一步提出查询分配与计算资源分配的联合建模方案,将其表述为混合整数非线性规划问题,并采用改进的分支定界算法求解。在真实云平台环境下基于实际数据集的实验表明,所提方法在效率上优于当前最先进的基准方法。相关代码已在GitHub开源。