Hypothesis testing is a statistical method used to draw conclusions about populations from sample data, typically represented in tables. With the prevalence of graph representations in real-life applications, hypothesis testing in graphs is gaining importance. In this work, we formalize node, edge, and path hypotheses in attributed graphs. We develop a sampling-based hypothesis testing framework, which can accommodate existing hypothesis-agnostic graph sampling methods. To achieve accurate and efficient sampling, we then propose a Path-Hypothesis-Aware SamplEr, PHASE, an m- dimensional random walk that accounts for the paths specified in a hypothesis. We further optimize its time efficiency and propose PHASEopt. Experiments on real datasets demonstrate the ability of our framework to leverage common graph sampling methods for hypothesis testing, and the superiority of hypothesis-aware sampling in terms of accuracy and time efficiency.
翻译:假设检验是一种从样本数据(通常以表格形式表示)中得出总体结论的统计方法。随着图表示在实际应用中的普及,图上的假设检验日益重要。本文对属性图中的节点假设、边假设和路径假设进行了形式化定义。我们开发了一个基于采样的假设检验框架,该框架能够兼容现有与假设无关的图采样方法。为实现准确高效的采样,我们进一步提出了路径假设感知采样器PHASE——一种考虑假设中指定路径的m维随机游走算法,并对其进行了时间效率优化,提出了PHASEopt算法。真实数据集上的实验表明,本框架能够利用常见图采样方法进行假设检验,且假设感知采样在准确性和时间效率方面具有显著优势。