Latent diffusion is a promising framework for scalable 3D molecular generation, but it requires a latent space that remains smooth, valid, and navigable beyond posterior samples. Existing molecular VAEs, however, are typically learned through reconstruction-based objectives, which do not guarantee such a latent space. We show that this leads to dark areas: regions of latent space that are reachable during diffusion sampling but decode to disconnected or chemically invalid molecules. Unlike in image generation, molecular decoding requires strict structural and chemical precision, so even small latent perturbations can produce catastrophic failures. We therefore propose TopVAE, a topology-optimized VAE that reduces dark areas by making the decoder internalize structural and chemical constraints during training, eliminating the need for test-time chemical correction. TopVAE greatly improves off-posterior robustness, and when paired with a standard DiT, achieves $77\%$ lower FCD-3D on QM9, the highest V&C, $52\%$ lower FCD-3D on GEOM-Drugs, and $1.29{\times}$ more stable and connected molecules on zero-shot scaffold inpainting.
翻译:潜空间扩散为可扩展的三维分子生成提供了有前景的框架,但要求其潜空间在后验样本之外仍保持平滑、有效且可导航。然而,现有分子变分自编码器通常基于重构目标进行学习,无法保证潜空间满足上述特性。我们证明这会导致暗区问题:扩散采样过程中可达的潜空间区域,解码后却生成结构不连续或化学无效的分子。与图像生成不同,分子解码需严格遵循结构和化学精度,细微的潜变量扰动即可引发灾难性失败。为此,我们提出拓扑优化变分自编码器TopVAE,通过使解码器在训练过程中内化结构与化学约束来减少暗区,从而免除测试阶段的化学校正。TopVAE显著提升后验外鲁棒性,当其与标准DiT结合时,在QM9数据集上实现FCD-3D降低77%(达最高值)且V&C指标最优,在GEOM-Drugs数据集上FCD-3D降低52%,在零样本支架修补任务中生成了1.29倍更稳定且连接性更强的分子。