Database migration is a key task in software modernization, increasingly involving transformations across heterogeneous data models such as relational and NoSQL systems. Existing approaches are typically designed for specific source-target combinations, which limits their applicability in multi-model environments. This paper proposes a generic database migration approach based on the U-Schema unified data model, which acts as a pivot representation. By defining mappings between each data model and U-Schema, the approach reduces the number of required transformations and enables schema conversion across heterogeneous paradigms. Trace information is generated during schema transformation to capture correspondences between source and target elements, and is subsequently used to guide data migration in a decoupled manner. The approach has been implemented and evaluated through experiments covering schema-level validation, data-level semantic preservation, and performance analysis. The results show that the migration pipeline achieves high structural preservation under round-trip reconstruction, produces document schemas consistent with the intended design decisions, and preserves query behavior across a variety of access patterns, including joins, aggregations, and nested structures. Performance results demonstrate the feasibility of the approach for datasets of increasing size. The evaluation focuses on relational-to-document migration using both synthetic datasets and the Northwind benchmark. While this scenario provides a concrete instantiation, the approach is designed to support multiple data models within a unified framework.
翻译:数据库迁移是软件现代化中的关键任务,日益涉及关系型和NoSQL系统等异构数据模型的转换。现有方法通常针对特定的源-目标组合设计,限制了其在多模型环境中的适用性。本文提出一种基于U-Schema统一数据模型的通用数据库迁移方法,该模型作为枢纽表示。通过定义每个数据模型与U-Schema之间的映射,该方法减少了所需转换的次数,并实现跨异构范式的模式转换。在模式转换过程中生成追踪信息,以捕获源元素与目标元素之间的对应关系,随后以解耦方式用于指导数据迁移。该方法已被实现并通过实验评估,涵盖模式级别验证、数据级别语义保持及性能分析。结果表明,该迁移管道在往返重建中实现了高结构保留率,生成的文档模式与预期设计决策一致,并能在包括连接、聚合和嵌套结构在内的多种访问模式下保持查询行为。性能结果证明了该方法对递增规模数据集的可行性。评估聚焦于使用合成数据集和Northwind基准测试的关系型到文档型迁移。尽管此场景提供了具体实例化,该方法旨在统一框架内支持多种数据模型。