We study how to allocate a fixed supervised fine-tuning budget when three objectives must be balanced at once: multi-turn safety alignment, low over-refusal on benign boundary queries, and instruction following under verifiable constraints. We propose MOSAIC (Multi-Objective Slice-Aware Iterative Curation for Alignment), a multi-objective framework for closed-loop data mixture search built on a unified L1-L3 evaluation interface. MOSAIC turns slice-level failure profiles into executable data actions, including dataset-level mixture ratios, bucket-level weights, and focus criteria. Under a fixed 1M-token budget and five rounds of independent fine-tuning from the same base model, MOSAIC improves internal XGuard from 2.76 to 4.67 while keeping OrBench at 4.41 and IFEval at 3.65. The final Pareto solution also generalizes better than a random static LoRA baseline on independent attack, over-refusal, and capability tests, suggesting that structured failure diagnosis can serve as a practical control signal for budgeted data construction. Code is available at https://github.com/douyipu/mosaic.
翻译:我们研究如何在固定监督微调预算下同时平衡三个目标:多轮安全对齐、良性边界查询的低过度拒绝率以及可验证约束下的指令跟随。为此提出MOSAIC(面向对齐的多目标切片感知迭代策展方法),一种基于统一L1-L3评估接口的闭环数据混合搜索多目标框架。MOSAIC将切片级故障特征转化为可执行的数据操作,包括数据集级混合比例、桶级权重及聚焦准则。在固定百万token预算下,基于同一基础模型进行五轮独立微调,MOSAIC将内部XGuard指标从2.76提升至4.67,同时保持OrBench为4.41及IFEval为3.65。最终帕累托解在独立攻击、过度拒绝及能力测试中均优于随机静态LoRA基线,表明结构化故障诊断可作为预算约束下数据构造的实用控制信号。代码详见https://github.com/douyipu/mosaic。