We address prediction problems on tabular categorical data, where each instance is defined by multiple categorical attributes, each taking values from a finite set. These attributes are often referred to as fields, and their categorical values as features. Such problems frequently arise in practical applications, including click-through rate prediction and social sciences. We introduce and analyze {tensorFM}, a new model that efficiently captures high-order interactions between attributes via a low-rank tensor approximation representing the strength of these interactions. Our model generalizes field-weighted factorization machines. Empirically, tensorFM demonstrates competitive performance with state-of-the-art methods. Additionally, its low latency makes it well-suited for time-sensitive applications, such as online advertising.
翻译:我们研究表格型分类数据的预测问题,其中每个实例由多个分类属性定义,每个属性从有限集合中取值。这些属性通常被称为字段,其分类值被称为特征。此类问题在实际应用中频繁出现,包括点击率预测和社会科学领域。我们提出并分析了{tensorFM},这是一种通过低秩张量近似表示交互强度来高效捕捉属性间高阶交互的新模型。我们的模型推广了字段加权分解机。实证结果表明,tensorFM与最先进方法相比具有竞争力。此外,其低延迟特性使其特别适合时间敏感型应用,例如在线广告。