Fractional Brownian trajectories (fBm) feature both randomness and strong scale-free correlations, challenging generative models to reproduce the intrinsic memory characterizing the underlying process. Here we test a diffusion probabilistic model on a specific dataset of corrupted images corresponding to incomplete Euclidean distance matrices of fBm at various memory exponents $H$. Our dataset implies uniqueness of the data imputation in the regime of low missing ratio, where the remaining partial graph is rigid, providing the ground truth for the inpainting. We find that the conditional diffusion generation stably reproduces the statistics of missing fBm-distributed distances for different values of $H$ exponent. Furthermore, while diffusion models have been recently shown to remember samples from the training database, we show that diffusion-based inpainting behaves qualitatively different from the database search with the increasing database size. Finally, we apply our fBm-trained diffusion model with $H=1/3$ for completion of chromosome distance matrices obtained in single-cell microscopy experiments, showing its superiority over the standard bioinformatics algorithms. Our source code is available on GitHub at https://github.com/alobashev/diffusion_fbm.
翻译:分式布朗轨迹同时具有随机性和强无标度相关性,这给生成模型重现内在记忆特征带来了挑战。本文针对由不同记忆指数$H$下分式布朗运动的不完全欧氏距离矩阵构成的特定受损图像数据集,测试了扩散概率模型。该数据集表明,在缺失率较低时数据插补具有唯一性,此时剩余部分图是刚性的,为修复提供了真实基准。我们发现条件扩散生成过程能稳定重现不同$H$指数下缺失的分式布朗分布距离统计特征。此外,尽管近期研究表明扩散模型会记忆训练数据库中的样本,但我们证明基于扩散的修复与随着数据库规模增大而进行的数据库搜索存在本质差异。最后,将$H=1/3$参数下训练的分式布朗扩散模型应用于单细胞显微实验中获得的染色体距离矩阵补全,展示了其优于标准生物信息学算法的性能。我们的源代码已托管于GitHub(https://github.com/alobashev/diffusion_fbm)。