Duet: efficient and scalable hybriD neUral rElation undersTanding

Cardinality estimation methods based on probability distribution estimation have achieved high-precision estimation results compared to traditional methods. However, the most advanced methods suffer from high estimation costs due to the sampling method they use when dealing with range queries. Also, such a sampling method makes them difficult to differentiate, so the supervision signal from the query workload is difficult to train the model to improve the accuracy of cardinality estimation. In this paper, we propose a new hybrid and deterministic modeling approach (Duet) for the cardinality estimation problem which has better efficiency and scalability compared to previous approaches. Duet allows for direct cardinality estimation of range queries with significantly lower time and memory costs, as well as in a differentiable form. As the prediction process of this approach is differentiable, we can incorporate queries with larger model estimation errors into the training process to address the long-tail distribution problem of model estimation errors on high dimensional tables. We evaluate Duet on classical datasets and benchmarks, and the results prove the effectiveness of Duet.

翻译：基于概率分布估计的基数估计方法相比传统方法实现了高精度的估计结果。然而，最先进的方法在处理范围查询时，由于其采用的采样方法导致估计成本过高。此外，这种采样方法使得模型难以区分，因此来自查询工作负载的监督信号难以训练模型以提高基数估计的准确性。本文针对基数估计问题提出了一种新的混合确定性建模方法（Duet），相比现有方法具有更高的效率和可扩展性。Duet可直接对范围查询进行基数估计，显著降低了时间和内存成本，并且以可微分形式实现。由于该方法的预测过程是可微的，我们可以将模型估计误差较大的查询纳入训练过程，以解决高维表上模型估计误差的长尾分布问题。我们在经典数据集和基准测试上评估了Duet，结果证明了Duet的有效性。

相关内容

Duet

关注 0

Duet Game 开发商Kumobius Pty Ltd，更新时间2014年5月2日。
Duet Game的节奏并不复杂，通过不断的重新排列组合，来重新定义关卡的难度。

游戏界面不定时飘来方块，根据音乐的节奏来变换着队形。而玩家需要做的便是，在适当的时机，通过触摸屏幕来巧妙而灵活的躲避下坠的方块。点触屏幕两侧，使方块旋转或扭曲，避开前进道路上的障碍物。即使开头很简单，最后可能很复杂。

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日