Variance is a basic metric to evaluate the degree of data dispersion, and it is also frequently used in the realm of statistics. However, due to the computing variance and the large dataset being time-consuming, there is an urge to accelerate this computing process. The paper suggests a new method to reduce the time of this computation, it assumes a scenario in which we already know the variance of the original dataset, and the whole variance of this merge dataset could be expressed in the form of addition between the original variance and a remainder term. When we want to calculate the total variance after this adds up, the method only needs to calculate the remainder to get the result instead of recalculating the total variance again, which we named this type of method as PKA(Prior Knowledge Acceleration). The paper mathematically proves the effectiveness of PKA in variance calculation, and the conditions for this method to accelerate properly.
翻译:方差是评估数据离散程度的基本度量指标,在统计学领域也频繁使用。然而,由于计算方差和大规模数据集的处理耗时较长,迫切需要加速这一计算过程。本文提出一种减少该计算时间的新方法,其假设场景为已知原始数据集的方差,而合并后数据集的整体方差可以表示为原始方差与一个余项之和的形式。当需要计算数据合并后的总方差时,该方法仅需计算余项即可获得结果,而无需重新计算总方差,我们将此类方法命名为PKA(先验知识加速)。本文从数学上证明了PKA在方差计算中的有效性,并阐明了该方法正确加速所需满足的条件。