Using data from professional bouldering competitions from 2008 to 2022, we train a logistic regression to predict climber results and measure climber skill. However, this approach is limited, as a single numeric coefficient per climber cannot adequately capture the intricacies of climbers' varying strengths and weaknesses in different boulder problems. For example, some climbers might prefer more static, technical routes while other climbers may specialize in powerful, dynamic problems. To this end, we apply Probabilistic Matrix Factorization (PMF), a framework commonly used in recommender systems, to represent the unique characteristics of climbers and problems with latent, multi-dimensional vectors. In this framework, a climber's performance on a given problem is predicted by taking the dot product of the corresponding climber vector and problem vectors. PMF effectively handles sparse datasets, such as our dataset where only a subset of climbers attempt each particular problem, by extrapolating patterns from similar climbers. We contrast the empirical performance of PMF to the logistic regression approach and investigate the multivariate representations produced by PMF to gain insights into climber characteristics. Our results show that the multivariate PMF representations improve predictive performance of professional bouldering competitions by capturing both the overall strength of climbers and their specialized skill sets. We provide our code open-source at https://github.com/baronet2/boulder2vec.
翻译:基于2008年至2022年职业抱石比赛数据,我们训练了逻辑回归模型以预测攀岩者成绩并评估其技能水平。然而,该方法存在局限性,因为每位攀岩者仅对应单一数值系数,无法充分捕捉其在不同抱石线路中表现出的复杂强度与弱点差异。例如,部分攀岩者可能更擅长静态技术型线路,而另一些则专精于爆发力驱动的动态型线路。为此,我们采用推荐系统中常用的概率矩阵分解(PMF)框架,通过潜在多维向量表征攀岩者与线路的独特特征。在该框架中,特定线路的攀岩者表现可通过对应攀岩者向量与线路向量的点积进行预测。PMF能有效处理稀疏数据集(例如本研究中仅部分攀岩者尝试每条特定线路的情况),通过从相似攀岩者中推断规律实现数据补全。我们将PMF的实证性能与逻辑回归方法进行对比,并通过分析PMF生成的多维表征来深入理解攀岩者特征。研究结果表明,多维PMF表征通过同时捕捉攀岩者的整体实力与专项技能,显著提升了职业抱石比赛的预测性能。我们在https://github.com/baronet2/boulder2vec开源相关代码。