Duet: efficient and scalable hybriD neUral rElation undersTanding

Learned cardinality estimation methods have achieved high precision compared to traditional methods. Among learned methods, query-driven approaches face the data and workload drift problem for a long time. Although both query-driven and hybrid methods are proposed to avoid this problem, even the state-of-art of them suffer from high training and estimation costs, limited scalability, instability, and long-tailed distribution problem on high cardinality and high dimensional tables, which seriously affects the practical application of learned cardinality estimators. In this paper, we prove that most of these problems are directly caused by the widely used progressive sampling. We solve this problem by introducing predicates into the autoregressive model and propose Duet, a stable, efficient, and scalable hybrid method to estimate cardinality directly without sampling or any non-differentiable process, which can not only reduces the inference complexity from $O(n)$ to $O(1)$ compared to Naru and UAE but also achieve higher accuracy on high cardinality and high dimensional tables. Experimental results show that Duet can achieve all the design goals above and be much more practical and even has a lower inference cost on CPU than that of most learned methods on GPU.

翻译：相较于传统方法，基于学习的基数估计方法已实现更高精度。在学习方法中，查询驱动方法长期面临数据与工作负载漂移问题。尽管已有查询驱动和混合方法被提出以规避此问题，但即便最先进的此类方法仍存在训练与估计成本高、可扩展性有限、稳定性不足以及高基数高维度表上的长尾分布问题，严重影响了学习型基数估计器的实际应用。本文证明，这些问题大多源于广泛使用的渐进采样方法。通过将谓词引入自回归模型，我们提出了Duet——一种稳定、高效且可扩展的混合方法，无需采样或任何不可微过程即可直接估计基数。该方法不仅将推理复杂度从Naru和UAE的$O(n)$降至$O(1)$，还在高基数高维度表上实现了更高精度。实验结果表明，Duet能够实现上述所有设计目标，具有更强的实用性，甚至在CPU上的推理成本低于多数学习型方法在GPU上的表现。

相关内容

Duet

关注 0

Duet Game 开发商Kumobius Pty Ltd，更新时间2014年5月2日。
Duet Game的节奏并不复杂，通过不断的重新排列组合，来重新定义关卡的难度。

游戏界面不定时飘来方块，根据音乐的节奏来变换着队形。而玩家需要做的便是，在适当的时机，通过触摸屏幕来巧妙而灵活的躲避下坠的方块。点触屏幕两侧，使方块旋转或扭曲，避开前进道路上的障碍物。即使开头很简单，最后可能很复杂。

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

37+阅读 · 2019年10月17日