In-place fast polynomial modular remainder

We consider the fast in-place computation of the Euclidean polynomial modular remainder $R(X) \not\equiv A(X) \mod B(X)$ with $A$ and $B$ of respective degrees n and m $\le$ n. If the multiplication of two polynomials of degree $k$ can be performed with $M(k)$ operations and $O(k)$ extra space, then standard algorithms for the remainder require $O(n/m M(m))$ arithmetic operations and, apart from that of $A$ and $B$, at least $O(n-m)$ extra memory. This extra space is notably usually used to store the whole quotient $Q(X)$ such that $A = BQ + R$ with deg $R$ < deg $B$. We avoid the storage of the whole of this quotient, and propose an algorithm still using $O(n/m M(m))$ arithmetic operations but only $O(m)$ extra space.When the divisor $B$ is sparse with a constant number of non-zero terms, the arithmetic complexity bound reduces to $O(n)$. When it is allowed to use the input space of $A$ or $B$ for intermediate computations, but putting $A$ and $B$ back to their initial states after the completion of the remainder computation, we further propose an in-place algorithm (that is with its extra required space reduced to $O(1)$ only) using at most $O(n/m M(m) \log(m))$ arithmetic operations if $M(m)$ is quasi-linear and $O(n/m M(m))$ otherwise. We also propose variants that compute -- still in-place and with the same complexity bounds -- the over-place remainder $A(X) \not\equiv A(X) \mod B(X)$ and the accumulated remainder $R(X) +\not\equiv A(X) \mod B(X)$. To achieve this, we develop techniques for Toeplitz matrix operations which output is also part of the input. In-place accumulating versions are obtained for the latter and for polynomial remaindering. This is realized via further reductions to accumulated polynomial multiplication, for which in-place fast algorithms have recently been developed.

翻译：我们研究欧几里得多项式模余 $R(X) \not\equiv A(X) \mod B(X)$ 的快速原位计算问题，其中 $A$ 和 $B$ 的次数分别为 $n$ 和 $m \le n$。若两个 $k$ 次多项式的乘法可在 $M(k)$ 次运算和 $O(k)$ 的额外空间内完成，则标准余数算法需要 $O(n/m M(m))$ 次算术运算，且除 $A$ 和 $B$ 的存储空间外，至少需要 $O(n-m)$ 的额外内存。该额外空间通常用于存储完整的商 $Q(X)$（满足 $A = BQ + R$ 且 deg $R$ < deg $B$）。我们避免了完整商的空间存储，提出一种仍使用 $O(n/m M(m))$ 次算术运算但仅需 $O(m)$ 额外空间的算法。当除式 $B$ 为非零项数量恒定的稀疏多项式时，算术复杂度界可降至 $O(n)$。若允许使用 $A$ 或 $B$ 的输入空间进行中间计算，并在余数计算完成后将 $A$ 和 $B$ 恢复至初始状态，我们进一步提出一种原位算法（即额外空间需求降至 $O(1)$），当 $M(m)$ 为准线性时最多需要 $O(n/m M(m) \log(m))$ 次算术运算，否则需要 $O(n/m M(m))$ 次。同时，我们提出计算原位覆盖余数 $A(X) \not\equiv A(X) \mod B(X)$ 和累积余数 $R(X) +\not\equiv A(X) \mod B(X)$ 的变体算法，这些算法仍保持原位特性且具有相同复杂度界。为此，我们发展了以部分输入作为输出结果的Toeplitz矩阵运算技术。通过进一步归约至原位快速算法已近期发展的累积多项式乘法问题，我们获得了该类运算及多项式求余的原位累积版本。