Token Optimization Strategies for LLM-Based Oracle-to-PostgreSQL Migration

LLMs are increasingly used for software modernization, code translation, and database migration. However, LLM-based Oracle2PostgreSQL migration remains constrained by high token consumption, long-context degradation, dialect-specific semantic differences, and the risk of semantic drift during query transformation. Direct inclusion of large Oracle SQL/PL-SQL artefacts, schema definitions, procedural logic, and migration instructions into the model context increases cost and may reduce generation quality. This paper shows token optimization as a constrained transformation problem in LLM-based Oracle2PostgreSQL migration. The study formalizes and evaluates twelve token optimization strategies: baseline representation, context pruning, minification, DSL-based semantic compression, metadata augmentation, context refactoring, schema distillation, adaptive routing, AST-based minification, identifier masking, output constraint enforcement, and hybrid optimization. The strategies are evaluated on samples of 10 and 100 Oracle SQL queries using Valid Syntax Rate, Exact Match, Semantic Match, CodeBLEU, and Token Efficiency. The results show that mild context pruning preserves semantic quality almost at the baseline level, achieving 89.75% Semantic Match on the 100-query sample compared with 89.80% for the unoptimized baseline. Adaptive routing provides the best practical trade-off, reducing input tokens by 8.72% and output tokens by 5.49% while maintaining 88.40% Semantic Match and increasing Token Efficiency by 6.67%. Aggressive schema distillation increases Token Efficiency by 132.22% but results in a 44.50-percentage-point decrease in Semantic Match. The findings demonstrate that token optimization cannot be treated as simple prompt shortening; it must be evaluated as a multi-objective migration problem balancing cost, syntactic validity, semantic preservation, and structural fidelity.

翻译：大语言模型(LLM)正日益广泛应用于软件现代化、代码翻译及数据库迁移。然而，基于LLM的Oracle到PostgreSQL迁移仍受限于高Token消耗、长上下文退化、方言特异性语义差异以及查询转换过程中的语义漂移风险。将大型Oracle SQL/PL-SQL工件、模式定义、过程逻辑及迁移指令直接纳入模型上下文会增加成本，并可能降低生成质量。本文将LLM驱动的Oracle到PostgreSQL迁移中的Token优化问题建模为带约束的转换问题。研究形式化并评估了十二种Token优化策略：基线表示、上下文剪枝、最小化、基于领域特定语言的语义压缩、元数据增强、上下文重构、模式蒸馏、自适应路由、基于抽象语法树的最小化、标识符屏蔽、输出约束强制及混合优化。采用有效语法率、精确匹配、语义匹配、CodeBLEU及Token效率指标，在10条与100条Oracle SQL查询样本上评估各策略性能。结果表明，适度上下文剪枝几乎能保持基线水平的语义质量：在100条查询样本上语义匹配率达89.75%，与未优化基线的89.80%相近。自适应路由提供了最佳实际平衡，在输入Token减少8.72%、输出Token减少5.49%的同时，维持88.40%的语义匹配率并将Token效率提升6.67%。激进模式蒸馏虽将Token效率提升132.22%，但导致语义匹配率下降44.50个百分点。研究结果表明，Token优化不能视为简单的提示精简问题，必须将其作为平衡成本、语法有效性、语义保留及结构保真度的多目标迁移问题加以评估。