Editing a 3D asset locally, modifying a target region while preserving the rest, is a fundamental requirement of native 3D editing. Existing methods enforce locality through mechanisms external to the generator, such as manual 3D masks, post-hoc voxel merging, or 2D multi-view lifting. None of them intervene where the corruption actually originates: inside the ODE sampler. For a rectified-flow generator to achieve faithful local editing, its velocity field should be strong over the target editing region while vanishing on preserved content. Yet a single velocity field can hardly satisfy both requirements simultaneously, leading to three problems: (i) identity leakage that keeps the edit signal non-zero on preserved regions; (ii) no dedicated edit-amplification channel, so strengthening the edit inevitably perturbs identity; and (iii) an identity drag at the geometry and material stages, where a global condition pulls every token toward the target. We propose VS3D (Velocity-Space 3D Asset editing}), an inversion-free, training-free, and mask-free framework that addresses each problem with a targeted intervention inside the sampler. VS3D integrates three complementary modules, each corresponding to a specific stage of the editing pipeline. Reconstruction-Anchored Source Injection (RASI) absorbs identity leakage by turning the unconditional embedding into a per-step, asset-specific anchor calibrated through source reconstruction. Partial-Mean Guidance (PMG) amplifies the edit signal by contrasting high- and low-quality subsample estimates of the velocity difference, active only where a consistent edit exists. Twin-Agreement Residual injection (TAR) lets the sampler decide token by token what to preserve at the geometry and material stages.
翻译:对三维资产进行局部编辑——修改目标区域而保留其余部分——是原生三维编辑的基本需求。现有方法通过生成器外部的机制强制执行局部性,例如手动三维掩码、事后体素合并或二维多视角提升。但这些方法均未干预问题实际产生的根源:ODE采样器内部。对于实现忠实局部编辑的整流流生成器而言,其速度场应在目标编辑区域表现强势,同时在需保留的内容上趋于消失。然而,单一速度场很难同时满足这两个要求,导致三个问题:(i) 身份泄露,使得编辑信号在保留区域上非零;(ii) 缺乏专用编辑放大通道,因此强化编辑不可避免地扰动身份;(iii) 在几何和材质阶段存在身份拖拽,全局条件将每个令牌拉向目标方向。我们提出VS3D(速度空间三维资产编辑),这是一个免反演、免训练、免掩码的框架,通过在采样器内部进行针对性干预来解决每个问题。VS3D集成了三个互补模块,每个对应编辑流程的特定阶段。重建锚定源注入(RASI)通过将无条件嵌入转化为经源重建校准的逐步资产特定锚点,吸收身份泄露。部分均值引导(PMG)通过对比速度差的高质量与低质量子样本估计来放大编辑信号,仅在存在一致编辑的区域激活。双一致残差注入(TAR)让采样器在几何和材质阶段逐个令牌决定保留内容。