The scale and complexity of workloads in modern cloud services have brought into sharper focus a critical challenge in automated index tuning -- the need to recommend high-quality indexes while maintaining index tuning scalability. This challenge is further compounded by the requirement for automated index implementations to introduce minimal query performance regressions in production deployments, representing a significant barrier to achieving scalability and full automation. This paper directs attention to these challenges within automated index tuning and explores ways in which machine learning (ML) techniques provide new opportunities in their mitigation. In particular, we reflect on recent efforts in developing ML techniques for workload selection, candidate index filtering, speeding up index configuration search, reducing the amount of query optimizer calls, and lowering the chances of performance regressions. We highlight the key takeaways from these efforts and underline the gaps that need to be closed for their effective functioning within the traditional index tuning framework. Additionally, we present a preliminary cross-platform design aimed at democratizing index tuning across multiple SQL-like systems -- an imperative in today's continuously expanding data system landscape. We believe our findings will help provide context and impetus to the research and development efforts in automated index tuning.
翻译:现代云服务工作负载的规模与复杂性,使得自动化索引调优中的一个关键挑战愈发凸显——即在保持索引调优可扩展性的同时,推荐高质量索引的需求。自动化索引实现需在生产部署中引入最小的查询性能回退(query performance regression),这一要求进一步加剧了上述挑战,成为实现可扩展性与完全自动化的重大障碍。本文聚焦于自动化索引调优中的这些挑战,并探讨机器学习(ML)技术为其缓解提供的新机遇。具体而言,我们回顾了近年来在以下方面的研究努力:利用ML技术进行工作负载选择、候选索引过滤、加速索引配置搜索、减少查询优化器调用次数、以及降低性能回退风险。我们总结了这些研究的关键启示,并指出其在传统索引调优框架内有效运行时仍需弥合的差距。此外,我们提出了一项跨平台初步设计方案,旨在推动索引调优在多种类SQL系统中的普及——这在当今不断扩展的数据系统格局中至关重要。我们相信,本文的发现将为自动化索引调优的研究与开发工作提供背景与动力。