Skyline queries are one of the most widely adopted tools for Multi-Criteria Analysis, with applications covering diverse domains, including, e.g., Database Systems, Data Mining, and Decision Making. Skylines indeed offer a useful overview of the most suitable alternatives in a dataset, while discarding all the options that are dominated by (i.e., worse than) others. The intrinsically quadratic complexity associated with skyline computation has pushed researchers to identify strategies for parallelizing the task, particularly by partitioning the dataset at hand. In this paper, after reviewing the main partitioning approaches available in the relevant literature, we propose two orthogonal optimization strategies for reducing the computational overhead, and compare them experimentally in a multi-core environment equipped with PySpark.
翻译:天际线查询是多标准分析中最广泛采用的工具之一,其应用涵盖多个领域,例如数据库系统、数据挖掘和决策制定。天际线确实提供了数据集中最合适选项的有用概览,同时丢弃所有被其他选项支配(即较差)的选项。与天际线计算相关的本质二次复杂度促使研究人员寻找并行化该任务的策略,特别是通过对现有数据集进行分区。在本文中,回顾了相关文献中的主要分区方法后,我们提出了两种正交的优化策略以减少计算开销,并在配备PySpark的多核环境中对它们进行了实验比较。