In this paper, we present a static code analysis strategy to extract logical schemas from NoSQL applications. Our solution is based on a model-driven reverse engineering process composed of a chain of platform-independent model transformations. The extracted schema conforms to the U-Schema unified metamodel, which can represent both NoSQL and relational schemas. To support this process, we define a metamodel capable of representing the core elements of object-oriented languages. Application code is first injected into a code model, from which a control flow model is derived. This, in turn, enables the generation of a model representing both data access operations and the structure of stored data. From these models, the U-Schema logical schema is inferred. Additionally, the extracted information can be used to identify refactoring opportunities. We illustrate this capability through the detection of join-like query patterns and the automated application of field duplication strategies to eliminate expensive joins. All stages of the process are described in detail, and the approach is validated through a round-trip experiment in which a application using a MongoDB store is automatically generated from a predefined schema. The inferred schema is then compared to the original to assess the accuracy of the extraction process.
翻译:本文提出一种静态代码分析策略,用于从NoSQL应用中提取逻辑模式。我们的解决方案基于模型驱动的逆向工程流程,该流程由一系列平台无关的模型转换链构成。所提取的模式符合U-Schema统一元模型规范,该模型能够同时表示NoSQL与关系型模式。为支撑此流程,我们定义了一种能够表征面向对象语言核心要素的元模型。应用代码首先被注入代码模型,继而从中推导出控制流模型。基于此,可生成同时表征数据访问操作与存储数据结构的模型。从这些模型中可推断出U-Schema逻辑模式。此外,提取的信息可用于识别代码重构机会。我们通过检测类连接查询模式,以及自动应用字段复制策略以消除高代价连接操作,来具体展示该能力。本文详细描述了流程的所有阶段,并通过往返实验验证了该方法的有效性:实验从一个预定义模式自动生成使用MongoDB存储的应用程序,随后将推断出的模式与原始模式进行对比,以评估提取过程的准确性。