Transformations produced by image and video generation models often evolve in a highly non-linear manner: long stretches where the content barely changes are followed by sudden, abrupt semantic jumps. To analyze and correct this behavior, we introduce a Semantic Progress Function, a one-dimensional representation that captures how the meaning of a given sequence evolves over time. For each frame, we compute distances between semantic embeddings and fit a smooth curve that reflects the cumulative semantic shift across the sequence. Departures of this curve from a straight line reveal uneven semantic pacing. Building on this insight, we propose a semantic linearization procedure that reparameterizes (or retimes) the sequence so that semantic change unfolds at a constant rate, yielding smoother and more coherent transitions. Beyond linearization, our framework provides a model-agnostic foundation for identifying temporal irregularities, comparing semantic pacing across different generators, and steering both generated and real-world video sequences toward arbitrary target pacing.
翻译:图像与视频生成模型产生的变换往往具有高度非线性的演化特征:长时间的内容几乎不变后,紧随其后的则是突然的语义跃迁。为分析并修正这一行为,我们引入语义进展函数(Semantic Progress Function),这是一种一维表示,用于捕获给定序列的语义随时间演化的过程。针对每一帧,我们计算语义嵌入之间的距离,并拟合出一条平滑曲线,以反映整个序列中的累积语义偏移。该曲线偏离直线的程度揭示了语义步调的不均匀性。基于这一观察,我们提出了一种语义线性化流程,通过对序列进行重参数化(或重定时),使语义变化以恒定速率展开,从而生成更平滑、更连贯的过渡。除线性化外,我们的框架还提供了一种与模型无关的基础,用于识别时间异常、比较不同生成器的语义步调,以及引导生成的和真实的视频序列达到任意目标步调。