SQL2Circuits：基于量子自然语言处理的SQL查询基数、执行时间与成本估计 (SQL2Circuits: Estimating Cardinalities, Execution Times, and Costs for SQL Queries with Quantum Natural Language Processing)

Recent advances in quantum computing have led to progress in exploring quantum applications across diverse fields, including databases and data management. This work presents a quantum machine learning model that tackles the challenge of estimating metrics, such as cardinalities, execution times, and costs, for SQL queries in relational databases. Precise estimations are crucial for the query optimizer to optimize query processing in relational databases efficiently. Our proposed quantum machine learning model consists of a novel query encoding mechanism, which maps SQL queries into high-dimensional Hilbert spaces using grammatical representations of the queries. The encoding mechanism translates SQL queries into parameterized quantum circuits, forming the core of the quantum machine learning model. The parameters in this model are tuned using standard quantum machine learning techniques. This encoding was first developed in quantum natural language processing (QNLP), and this work demonstrates its natural application in database optimization. Because the encoding mechanism is mathematically robust, the quantum machine learning model is also explainable, allowing us to draw a one-to-one correspondence between the elements in SQL queries and the model's parameters. The method is also scalable because it consists of multiple circuits, and we train and evaluate the model with hundreds of queries. Compared to previous research, our model achieves high accuracy, supporting the results obtained in the original QNLP research. We extend the previous QNLP work by adding 4-class and 8-class classification tasks and comparing the cardinality estimation results with those from state-of-the-art databases. We theoretically analyze the quantum machine learning model by calculating its expressibility and entangling capabilities.

翻译：量子计算的最新进展推动了包括数据库与数据管理在内的多个领域中对量子应用的探索。本研究提出了一种量子机器学习模型，旨在解决关系数据库中SQL查询的基数、执行时间及成本等指标的估计难题。精确的估计对于查询优化器高效优化关系数据库中的查询处理至关重要。我们提出的量子机器学习模型包含一种新颖的查询编码机制，该机制利用查询的语法表示将SQL查询映射到高维希尔伯特空间。编码机制将SQL查询转换为参数化量子电路，构成了量子机器学习模型的核心。该模型中的参数通过标准量子机器学习技术进行调优。此编码方法最初在量子自然语言处理（QNLP）中发展而来，本工作展示了其在数据库优化中的自然应用。由于编码机制在数学上具有鲁棒性，该量子机器学习模型也具备可解释性，使我们能够在SQL查询中的元素与模型参数之间建立一一对应关系。该方法还具有可扩展性，因为它由多个电路组成，并且我们使用数百个查询对模型进行了训练和评估。与先前研究相比，我们的模型实现了较高的准确度，支持了原始QNLP研究中获得的结果。我们通过增加4类和8类分类任务，并将基数估计结果与最先进数据库的结果进行比较，扩展了先前的QNLP工作。我们通过计算模型的表达能力和纠缠能力，对量子机器学习模型进行了理论分析。