Velocity-Space 3D Asset Editing

Editing a 3D asset locally, modifying a target region while preserving the rest, is a fundamental requirement of native 3D editing. Existing methods enforce locality through mechanisms external to the generator, such as manual 3D masks, post-hoc voxel merging, or 2D multi-view lifting. None of them intervene where the corruption actually originates: inside the ODE sampler. For a rectified-flow generator to achieve faithful local editing, its velocity field should be strong over the target editing region while vanishing on preserved content. Yet a single velocity field can hardly satisfy both requirements simultaneously, leading to three problems: (i) identity leakage that keeps the edit signal non-zero on preserved regions; (ii) no dedicated edit-amplification channel, so strengthening the edit inevitably perturbs identity; and (iii) an identity drag at the geometry and material stages, where a global condition pulls every token toward the target. We propose VS3D (Velocity-Space 3D Asset editing}), an inversion-free, training-free, and mask-free framework that addresses each problem with a targeted intervention inside the sampler. VS3D integrates three complementary modules, each corresponding to a specific stage of the editing pipeline. Reconstruction-Anchored Source Injection (RASI) absorbs identity leakage by turning the unconditional embedding into a per-step, asset-specific anchor calibrated through source reconstruction. Partial-Mean Guidance (PMG) amplifies the edit signal by contrasting high- and low-quality subsample estimates of the velocity difference, active only where a consistent edit exists. Twin-Agreement Residual injection (TAR) lets the sampler decide token by token what to preserve at the geometry and material stages.

翻译：对三维资产进行局部编辑——修改目标区域而保留其余部分——是原生三维编辑的基本需求。现有方法通过生成器外部的机制强制执行局部性，例如手动三维掩码、事后体素合并或二维多视角提升。但这些方法均未干预问题实际产生的根源：ODE采样器内部。对于实现忠实局部编辑的整流流生成器而言，其速度场应在目标编辑区域表现强势，同时在需保留的内容上趋于消失。然而，单一速度场很难同时满足这两个要求，导致三个问题：(i) 身份泄露，使得编辑信号在保留区域上非零；(ii) 缺乏专用编辑放大通道，因此强化编辑不可避免地扰动身份；(iii) 在几何和材质阶段存在身份拖拽，全局条件将每个令牌拉向目标方向。我们提出VS3D（速度空间三维资产编辑），这是一个免反演、免训练、免掩码的框架，通过在采样器内部进行针对性干预来解决每个问题。VS3D集成了三个互补模块，每个对应编辑流程的特定阶段。重建锚定源注入（RASI）通过将无条件嵌入转化为经源重建校准的逐步资产特定锚点，吸收身份泄露。部分均值引导（PMG）通过对比速度差的高质量与低质量子样本估计来放大编辑信号，仅在存在一致编辑的区域激活。双一致残差注入（TAR）让采样器在几何和材质阶段逐个令牌决定保留内容。