Data science applications increasingly rely on heterogeneous data sources and analytics. This has led to growing interest in polystore systems, especially analytical polystores. In this work, we focus on a class of emerging multi-data model analytics workloads that fluidly straddle relational, graph, and text analytics. Instead of a generic polystore, we build a ``tri-store'' system that is more aware of the underlying data models to better optimize execution to improve scalability and runtime efficiency. We name our system AWESOME (Analytics WorkbEnch for SOcial MEdia). It features a powerful domain-specific language named ADIL. ADIL builds on top of underlying query engines (e.g., SQL and Cypher) and features native data types for succinctly specifying cross-engine queries and NLP operations, as well as automatic in-memory and query optimizations. Using real-world tri-model analytical workloads and datasets, we empirically demonstrate the functionalities of AWESOME for scalable data science applications and evaluate its efficiency.
翻译:数据科学应用日益依赖于异构数据源与分析技术,这引发了对多存储系统(尤其是分析型多存储系统)的广泛关注。本研究聚焦于一类新兴的多数据模型分析工作负载,其灵活地融合了关系型、图结构与文本分析。不同于通用多存储系统,我们构建了一个更深入感知底层数据模型的"三存储"系统,以优化执行过程,提升可扩展性与运行时效率。该系统命名为AWESOME(社交媒体分析工作台),其核心特性包括:一个名为ADIL的领域专用语言。ADIL基于底层查询引擎(如SQL与Cypher),通过原生数据类型支持跨引擎查询与自然语言处理操作的简洁描述,并实现自动内存优化与查询优化。采用真实的三模型分析工作负载与数据集,我们实证展示了AWESOME在可扩展数据科学应用中的功能,并评估了其运行效率。