This work introduces ExaLogLog, a new data structure for approximate distinct counting, which has the same practical properties as the popular HyperLogLog algorithm. It is commutative, idempotent, mergeable, reducible, has a constant-time insert operation, and supports distinct counts up to the exa-scale. At the same time, as theoretically derived and experimentally verified, it requires 43% less space to achieve the same estimation error.
翻译:本文提出ExaLogLog,一种用于近似基数统计的新型数据结构,其具备与广泛使用的HyperLogLog算法相同的实用特性,包括可交换性、幂等性、可合并性、可约简性、常量时间插入操作,并支持高达百亿亿级的基数统计。同时,理论与实验均证实,在达到相同估计误差的前提下,其空间开销降低43%。