The design of SQL is based on a three-valued logic (3VL), rather than the familiar Boolean logic. 3VL adds a truth value unknown to true and false to handle nulls. Viewed as indispensable for SQL expressiveness, it is at the same time much criticized for unintuitive behavior of queries and being a source of programmer mistakes. We show that, contrary to the widely held view, SQL could have been designed based on the standard Boolean logic, without any loss of expressiveness and without giving up nulls. The approach itself follows SQL's evaluation, which only retains tuples for which conditions in WHERE evaluate to true. We show that conflating unknown with false leads to an equally expressive version of SQL that does not use the third truth value. Queries written under the two-valued semantics can be efficiently translated into the standard SQL and thus executed on any existing RDBMS. These results cover the core of the SQL 1999 Standard: SELECT-FROM-WHERE-GROUP BY-HAVING queries extended with subqueries and IN/EXISTS/ANY/ALL conditions, and recursive queries. We also investigate new optimization rules enabled by the two-valued SQL, and show that for many queries, including most of those found in benchmarks such as TPC-H and TPC-DS, there is no difference between three- and two-valued versions.
翻译:SQL的设计基于三值逻辑(3VL),而非人们熟知的布尔逻辑。三值逻辑在真与假之外增加了未知真值以处理空值。尽管三值逻辑被视为SQL表达能力的必要组成部分,但它同时因查询行为的反直觉性和成为程序员错误根源而备受批评。本文证明,与普遍观点相反,SQL本可以基于标准布尔逻辑进行设计,既不会损失表达能力,也无需放弃空值处理。该方法遵循SQL的求值机制,即仅保留WHERE子句中条件求值为真的元组。我们证明,将未知值与假值合并会产生一个具有同等表达能力且不使用第三真值的SQL版本。基于二值语义编写的查询可高效转换为标准SQL,从而在任何现有关系数据库管理系统上执行。这些结果涵盖了SQL 1999标准的核心:支持子查询及IN/EXISTS/ANY/ALL条件的SELECT-FROM-WHERE-GROUP BY-HAVING查询,以及递归查询。我们还研究了二值SQL启用的新优化规则,并证明对于包括TPC-H和TPC-DS等基准测试中大多数查询在内的众多查询,三值版本与二值版本之间不存在差异。