Count-Min Sketch with Conservative Updates: Worst-Case Analysis - 专知论文

会员服务 ·

0

估计/估计量 · 估计误差 · 模型评估 · 流 · Analysis ·

2024 年 5 月 20 日

Count-Min Sketch with Conservative Updates: Worst-Case Analysis

翻译：带保守更新的Count-Min Sketch：最坏情况分析

Younes Ben Mazziane,Othmane Marfoq

Count-Min Sketch with Conservative Updates (\texttt{CMS-CU}) is a memory-efficient hash-based data structure used to estimate the occurrences of items within a data stream. \texttt{CMS-CU} stores~$m$ counters and employs~$d$ hash functions to map items to these counters. We first argue that the estimation error in \texttt{CMS-CU} is maximal when each item appears at most once in the stream. Next, we study \texttt{CMS-CU} in this setting. Precisely, \begin{enumerate} \item In the case where~$d=m-1$, we prove that the average estimation error and the average counter rate converge almost surely to~$\frac{1}{2}$, contrasting with the vanilla Count-Min Sketch, where the average counter rate is equal to~$\frac{m-1}{m}$. \item For any given~$m$ and~$d$, we prove novel lower and upper bounds on the average estimation error, incorporating a positive integer parameter~$g$. Larger values of this parameter improve the accuracy of the bounds. Moreover, the computation of each bound involves examining an ergodic Markov process with a state space of size~$\binom{m+g-d}{g}$ and a sparse transition probabilities matrix containing~$\mathcal{O}(m\binom{m+g-d}{g})$ non-zero entries. \item For~$d=m-1$, $g=1$, and as $m\to \infty$, we show that the lower and upper bounds coincide. In general, our bounds exhibit high accuracy for small values of $g$, as shown by numerical computation. For example, for $m=50$, $d=4$, and $g=5$, the difference between the lower and upper bounds is smaller than~$10^{-4}$. \end{enumerate}

翻译：带保守更新的Count-Min Sketch（\texttt{CMS-CU}）是一种内存高效的基于哈希的数据结构，用于估计数据流中项目的出现次数。\texttt{CMS-CU} 存储~$m$ 个计数器，并使用~$d$ 个哈希函数将项目映射到这些计数器。我们首先论证，当每个项目在流中最多出现一次时，\texttt{CMS-CU} 的估计误差达到最大值。接着，我们研究此设置下的 \texttt{CMS-CU}。具体而言：\begin{enumerate} \item 在~$d=m-1$ 的情况下，我们证明平均估计误差和平均计数器比率几乎必然收敛到~$\frac{1}{2}$，这与普通 Count-Min Sketch 形成对比，后者的平均计数器比率等于~$\frac{m-1}{m}$。\item 对于任意给定的~$m$ 和~$d$，我们证明了平均估计误差的新颖下界和上界，其中包含一个正整数参数~$g$。该参数的值越大，边界的精度越高。此外，每个边界的计算涉及分析一个遍历马尔可夫过程，其状态空间大小为~$\binom{m+g-d}{g}$，转移概率矩阵稀疏，包含~$\mathcal{O}(m\binom{m+g-d}{g})$ 个非零项。\item 对于~$d=m-1$、$g=1$ 且~$m\to \infty$ 的情况，我们证明下界和上界重合。一般而言，如数值计算所示，我们的边界在 $g$ 值较小时表现出高精度。例如，当 $m=50$、$d=4$ 且 $g=5$ 时，下界与上界之间的差异小于~$10^{-4}$。\end{enumerate}

0

相关内容

估计/估计量

估计/估计量

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

32+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

164+阅读 · 2019年10月12日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

STRCF for Visual Object Tracking

STRCF for Visual Object Tracking

统计学习与视觉计算组

15+阅读 · 2018年5月29日

Focal Loss for Dense Object Detection

Focal Loss for Dense Object Detection

统计学习与视觉计算组

12+阅读 · 2018年3月15日

IJCAI | Cascade Dynamics Modeling with Attention-based RNN

IJCAI | Cascade Dynamics Modeling with Attention-based RNN

KingsGarden

13+阅读 · 2017年7月16日

From Softmax to Sparsemax-ICML16（1）

From Softmax to Sparsemax-ICML16（1）

KingsGarden

74+阅读 · 2016年11月26日

基于Amalgam空间的Hardy空间实变理论及其应用

国家自然科学基金

1+阅读 · 2017年12月31日

Hamilton-Jacibi方程的弱KAM理论

国家自然科学基金

2+阅读 · 2017年12月31日

Musielak-Orlicz-Sobolev 空间中的迹嵌入及其应用

国家自然科学基金

2+阅读 · 2015年12月31日

Volterra积分微分方程的多区间Chebyshev和Legendre谱配置法

国家自然科学基金

0+阅读 · 2015年12月31日

直接优化半周长线长的VLSI两阶段迭代布局算法研究

国家自然科学基金

0+阅读 · 2015年12月31日

Schr？dinger-Poisson方程守恒DDG方法研究

国家自然科学基金

2+阅读 · 2015年12月31日

基于决策模型和预备电位的运动想象BCI研究

国家自然科学基金

3+阅读 · 2015年12月31日

动态Gr？bner 基与GVW算法

国家自然科学基金

0+阅读 · 2014年12月31日

稳定广义有限元法的研究与若干典型工程应用

国家自然科学基金

0+阅读 · 2014年12月31日

Poisson流形上的修正Hamilton方法

国家自然科学基金

0+阅读 · 2014年12月31日

EVF-SAM: Early Vision-Language Fusion for Text-Prompted Segment Anything Model

EVF-SAM: Early Vision-Language Fusion for Text-Prompted Segment Anything Model

Arxiv

0+阅读 · 2024年6月28日

DRust: Language-Guided Distributed Shared Memory with Fine Granularity, Full Transparency, and Ultra Efficiency

Arxiv

0+阅读 · 2024年6月27日

FedMap: Iterative Magnitude-Based Pruning for Communication-Efficient Federated Learning

Arxiv

0+阅读 · 2024年6月27日

CLIP3D-AD: Extending CLIP for 3D Few-Shot Anomaly Detection with Multi-View Images Generation

Arxiv

0+阅读 · 2024年6月27日

GCRE-GPT: A Generative Model for Comparative Relation Extraction

Arxiv

0+阅读 · 2024年6月27日

Outlier-Efficient Hopfield Layers for Large Transformer-Based Models

Arxiv

0+阅读 · 2024年6月26日

Causal Machine Learning: A Survey and Open Problems

Arxiv

70+阅读 · 2022年6月30日

NeuroFluid: Fluid Dynamics Grounding with Particle-Driven Neural Radiance Fields

Arxiv

15+阅读 · 2022年3月3日

Anomalous Instance Detection in Deep Learning: A Survey

Anomalous Instance Detection in Deep Learning: A Survey

Arxiv

29+阅读 · 2020年3月16日

UNITER: Learning UNiversal Image-TExt Representations

UNITER: Learning UNiversal Image-TExt Representations

Arxiv

23+阅读 · 2019年9月25日

VIP会员

文章信息

相关主题

估计/估计量

最新内容

DeepSeek-V4：百万 Token 上下文背后，大模型正在进入“长程智能”时代（附中英文pdf版）

DeepSeek-V4：百万 Token 上下文背后，大模型正在进入“长程智能”时代（附中英文pdf版）

专知会员服务

0+阅读 · 8分钟前

以色列军事技术对美国军力发展的持续性赋能

以色列军事技术对美国军力发展的持续性赋能

专知会员服务

5+阅读 · 今天8:46

战场之外的较量：美伊冲突中的认知战与心理博弈

战场之外的较量：美伊冲突中的认知战与心理博弈

专知会员服务

4+阅读 · 今天7:41

俄乌战争中乌克兰防空能力演变与见解（中文版）

俄乌战争中乌克兰防空能力演变与见解（中文版）

专知会员服务

2+阅读 · 今天7:22

《面向巡飞弹药系统的情境感知深度强化学习自主非线性机动控制》

《面向巡飞弹药系统的情境感知深度强化学习自主非线性机动控制》

专知会员服务

6+阅读 · 今天6:04

《深度强化学习在兵棋推演中的应用》40页报告

《深度强化学习在兵棋推演中的应用》40页报告

专知会员服务

8+阅读 · 今天5:37

《多域作战面临复杂现实》

《多域作战面临复杂现实》

专知会员服务

6+阅读 · 今天5:35

《印度的多域作战：条令与能力发展》报告

《印度的多域作战：条令与能力发展》报告

专知会员服务

2+阅读 · 今天5:24

《是“修复情报”还是修复部队？阿富汗反叛乱行动中的美军情报调整》400页

《是“修复情报”还是修复部队？阿富汗反叛乱行动中的美军情报调整》400页

专知会员服务

2+阅读 · 今天5:18

美军的算法化军备库：无人机优势计划（DDP）、复制者倡议（Replicator）与联合全域指挥控制（JADC2）如何重写战争规则

美军的算法化军备库：无人机优势计划（DDP）、复制者倡议（Replicator）与联合全域指挥控制（JADC2）如何重写战争规则

专知会员服务

2+阅读 · 今天3:25

（中文版）美空军部发布《空军部数据战略》与《人工智能战略》两份战略：旨在加速建立军事优势

（中文版）美空军部发布《空军部数据战略》与《人工智能战略》两份战略：旨在加速建立军事优势

专知会员服务

15+阅读 · 今天2:55

【斯坦福博士论文】语言模型的机械可解释性与控制

【斯坦福博士论文】语言模型的机械可解释性与控制

专知会员服务

3+阅读 · 4月23日

大语言模型智能体长期记忆安全性综述：迈向记忆主权

大语言模型智能体长期记忆安全性综述：迈向记忆主权

专知会员服务

4+阅读 · 4月23日

美军被摧毁的空战装备：伊朗战争如何重创美国空中力量

美军被摧毁的空战装备：伊朗战争如何重创美国空中力量

专知会员服务

4+阅读 · 4月23日

人工智能赋能无人机：俄乌战争（万字长文）

人工智能赋能无人机：俄乌战争（万字长文）

专知会员服务

7+阅读 · 4月23日

相关VIP内容

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

32+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

164+阅读 · 2019年10月12日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

战场之外的较量：美伊冲突中的认知战与心理博弈

《面向巡飞弹药系统的情境感知深度强化学习自主非线性机动控制》

以色列军事技术对美国军力发展的持续性赋能

俄乌战争中乌克兰防空能力演变与见解（中文版）

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

STRCF for Visual Object Tracking

STRCF for Visual Object Tracking

统计学习与视觉计算组

15+阅读 · 2018年5月29日

Focal Loss for Dense Object Detection

Focal Loss for Dense Object Detection

统计学习与视觉计算组

12+阅读 · 2018年3月15日

IJCAI | Cascade Dynamics Modeling with Attention-based RNN

IJCAI | Cascade Dynamics Modeling with Attention-based RNN

KingsGarden

13+阅读 · 2017年7月16日

From Softmax to Sparsemax-ICML16（1）

From Softmax to Sparsemax-ICML16（1）

KingsGarden

74+阅读 · 2016年11月26日

相关论文

EVF-SAM: Early Vision-Language Fusion for Text-Prompted Segment Anything Model

EVF-SAM: Early Vision-Language Fusion for Text-Prompted Segment Anything Model

Arxiv

0+阅读 · 2024年6月28日

DRust: Language-Guided Distributed Shared Memory with Fine Granularity, Full Transparency, and Ultra Efficiency

Arxiv

0+阅读 · 2024年6月27日

FedMap: Iterative Magnitude-Based Pruning for Communication-Efficient Federated Learning

Arxiv

0+阅读 · 2024年6月27日

CLIP3D-AD: Extending CLIP for 3D Few-Shot Anomaly Detection with Multi-View Images Generation

Arxiv

0+阅读 · 2024年6月27日

GCRE-GPT: A Generative Model for Comparative Relation Extraction

Arxiv

0+阅读 · 2024年6月27日

Outlier-Efficient Hopfield Layers for Large Transformer-Based Models

Arxiv

0+阅读 · 2024年6月26日

Causal Machine Learning: A Survey and Open Problems

Arxiv

70+阅读 · 2022年6月30日

NeuroFluid: Fluid Dynamics Grounding with Particle-Driven Neural Radiance Fields

Arxiv

15+阅读 · 2022年3月3日

Anomalous Instance Detection in Deep Learning: A Survey

Anomalous Instance Detection in Deep Learning: A Survey

Arxiv

29+阅读 · 2020年3月16日

UNITER: Learning UNiversal Image-TExt Representations

UNITER: Learning UNiversal Image-TExt Representations

Arxiv

23+阅读 · 2019年9月25日

相关基金

基于Amalgam空间的Hardy空间实变理论及其应用

国家自然科学基金

1+阅读 · 2017年12月31日

Hamilton-Jacibi方程的弱KAM理论

国家自然科学基金

2+阅读 · 2017年12月31日

Musielak-Orlicz-Sobolev 空间中的迹嵌入及其应用

国家自然科学基金

2+阅读 · 2015年12月31日

Volterra积分微分方程的多区间Chebyshev和Legendre谱配置法

国家自然科学基金

0+阅读 · 2015年12月31日

直接优化半周长线长的VLSI两阶段迭代布局算法研究

国家自然科学基金

0+阅读 · 2015年12月31日

Schr？dinger-Poisson方程守恒DDG方法研究

国家自然科学基金

2+阅读 · 2015年12月31日

基于决策模型和预备电位的运动想象BCI研究

国家自然科学基金

3+阅读 · 2015年12月31日

动态Gr？bner 基与GVW算法

国家自然科学基金

0+阅读 · 2014年12月31日

稳定广义有限元法的研究与若干典型工程应用

国家自然科学基金

0+阅读 · 2014年12月31日

Poisson流形上的修正Hamilton方法

国家自然科学基金

0+阅读 · 2014年12月31日

微信扫码咨询专知VIP会员