AI-driven analytics are increasingly crucial to data-centric decision-making. The practice of exporting data to machine learning runtimes incurs high overhead, limits robustness to data drift, and expands the attack surface, especially in multi-tenant, heterogeneous data systems. Integrating AI directly into database engines, while offering clear benefits, introduces challenges in managing joint query processing and model execution, optimizing end-to-end performance, coordinating execution under resource contention, and enforcing strong security and access-control guarantees. This paper discusses the challenges of joint DB-AI, or AIxDB, data management and query processing within AI-powered data systems. It presents various challenges that need to be addressed carefully, such as query optimization, execution scheduling, and distributed execution over heterogeneous hardware. Database components such as transaction management and access control need to be re-examined to support AI lifecycle management, mitigate data drift, and protect sensitive data from unauthorized AI operations. We present a design and preliminary results to demonstrate what may be key to the performance for serving AIxDB queries.
翻译:AI驱动的分析在数据驱动的决策中日益关键。将数据导出至机器学习运行时的实践会产生高昂开销,限制对数据漂移的鲁棒性,并扩大攻击面,尤其是在多租户异构数据系统中。尽管将AI直接集成到数据库引擎中具有明显优势,但同时也带来了管理联合查询处理与模型执行、优化端到端性能、在资源争用下协调执行以及实施强安全性与访问控制保障等方面的挑战。本文探讨了在AI驱动的数据系统中,联合DB-AI(或称AIxDB)数据管理与查询处理所面临的挑战。文中详细阐述了需要审慎应对的多项挑战,例如查询优化、执行调度以及跨异构硬件的分布式执行。数据库组件(如事务管理与访问控制)需要重新审视,以支持AI生命周期管理、缓解数据漂移,并保护敏感数据免受未经授权的AI操作影响。我们提出了一种设计方案及初步结果,以展示可能对服务AIxDB查询的性能至关重要的关键要素。