Skyline queries are popular and effective tools in multi-criteria decision support as they extract interesting (pareto-optimal) points that help summarize the available data with respect to a given set of preference attributes. Unfortunately, the efficiency of the skyline algorithms depends heavily on the underlying data statistics. In this paper, we argue that the efficiency of the skyline algorithms could be significantly boosted if one could erase any attribute correlations that do not agree with the preference criteria, while preserving (or even boosting) correlations that agree with the user provided criteria. Therefore, we propose a causallyinformed selective de-correlation mechanism to enable skyline algorithms to better leverage the pruning opportunities provided by the positively-aligned data distributions, without having to suffer from the mis-alignments. In particular, we show that, given a causal graph that describes the underlying causal structure of the data, one can identify a subset of the attributes that can be used to selectively de-correlate the preference attributes. Importantly, the proposed causal search for skylines (CSS) approach is agnostic to the underlying candidate enumeration and pruning strategies and, therefore, can be leveraged to improve any popular skyline discovery algorithm. Experiments on multiple real and synthetic data sets and for different skyline discovery algorithms show that the proposed causally-informed selective de-correlation technique significantly reduces both the number of dominance checks as well as the overall time needed to locate skyline points.
翻译:天际线查询是多准则决策支持中流行且有效的工具,它们能提取出有趣的(帕累托最优)点,从而根据给定偏好属性集对可用数据进行概括。然而,天际线算法的效率在很大程度上取决于底层数据统计特性。本文提出,若能消除与偏好准则不一致的属性相关性,同时保留(甚至增强)与用户提供准则一致的相关性,则天际线算法的效率可获得显著提升。为此,我们提出一种基于因果启发的选择性去相关机制,使天际线算法能够更好地利用正对齐数据分布提供的剪枝机会,同时避免错位对齐带来的负面影响。具体而言,我们证明在给定描述数据底层因果结构的因果图后,可以识别出可用于选择性去除偏好属性相关性的属性子集。值得注意的是,所提出的因果天际线搜索方法独立于底层候选枚举和剪枝策略,因此可用于改进任何主流的天际线发现算法。在多个真实与合成数据集上针对不同天际线发现算法的实验表明,所提出的因果启发性选择性去相关技术能显著减少支配性检查次数,并大幅缩短定位天际线点的总体时间。