Query optimization is crucial for every database management system (DBMS) to enable fast execution of declarative queries. Most DBMS designs include cost-based query optimization. However, MongoDB implements a different approach to choose an execution plan that we call "first past the post" (FPTP) query optimization. FPTP does not estimate costs for each execution plan, but rather partially executes the alternative plans in a round-robin race and observes the work done by each relative to the number of records returned. In this paper, we analyze the effectiveness of MongoDB's FPTP query optimizer. We see whether the optimizer chooses the best execution plan among the alternatives and measure how the chosen plan compares to the optimal plan. We also show how to visualize the effectiveness and identify situations where the MongoDB 7.0.1 query optimizer chooses suboptimal query plans. Through experiments, we conclude that FPTP has a preference bias, choosing index scans even in many cases where collection scans would run faster. We identify the reasons for the preference bias, which can lead MongoDB to choose a plan with more than twice the runtime compared to the optimal plan for the query.
翻译:查询优化对于每个数据库管理系统(DBMS)实现声明式查询的快速执行至关重要。大多数DBMS设计都包含基于成本的查询优化。然而,MongoDB采用了一种不同的方法来选择执行计划,我们称之为“先到先得”(FPTP)查询优化。FPTP并不估算每个执行计划的成本,而是以轮询竞赛的方式部分执行备选计划,并观察每个计划相对于返回记录数所完成的工作量。在本文中,我们分析了MongoDB的FPTP查询优化器的有效性。我们检验了优化器是否在备选方案中选择了最佳执行计划,并测量了所选计划与最优计划的比较结果。我们还展示了如何可视化其有效性,并识别MongoDB 7.0.1查询优化器选择次优查询计划的情况。通过实验,我们得出结论:FPTP存在偏好偏差,即使在许多集合扫描运行更快的场景下,它仍倾向于选择索引扫描。我们找出了这种偏好偏差的原因,该偏差可能导致MongoDB为查询选择的计划,其运行时间超过最优计划的两倍以上。