Despite the ubiquity of multiway data across scientific domains, there are few user-friendly tools that fit tailored nonnegative tensor factorizations. Researchers may use gradient-based automatic differentiation (which often struggles in nonnegative settings), choose between a limited set of methods with mature implementations, or implement their own model from scratch. As an alternative, we introduce NNEinFact, an einsum-based multiplicative update algorithm that fits any nonnegative tensor factorization expressible as a tensor contraction by minimizing one of many user-specified loss functions (including the $(α,β)$-divergence). To use NNEinFact, the researcher simply specifies their model with a string. NNEinFact converges to a local minimum of the loss, supports missing data, and fits to tensors with hundreds of millions of entries in seconds. Empirically, NNEinFact fits custom models which outperform standard ones in heldout prediction tasks on real-world tensor data by over $37\%$ and attains less than half the test loss of gradient-based methods while converging up to 90 times faster.
翻译:尽管多路数据在科学领域无处不在,但能够拟合定制化非负张量分解的用户友好工具却寥寥无几。研究者通常采用基于梯度的自动微分方法(该方法在非负设定下往往效果不佳),在有限几种具有成熟实现的方法中进行选择,或者从头开始自行实现模型。作为替代方案,我们提出了NNEinFact——一种基于einsum的乘法更新算法,该算法通过最小化用户指定的多种损失函数(包括$(α,β)$-散度),能够拟合任何可表示为张量缩并的非负张量分解模型。使用NNEinFact时,研究者仅需通过字符串指定其模型。该算法能够收敛至损失的局部最小值,支持缺失数据,并可在数秒内拟合具有数亿条目的张量。实证研究表明,在真实世界张量数据的留出预测任务中,NNEinFact拟合的定制模型性能超越标准模型达$37\%$以上,其测试损失不到基于梯度方法的一半,且收敛速度最高可提升90倍。