Predicting the outcome of elections, sporting events, entertainment awards, and other competitions has long captured the human imagination. Such prediction is growing in sophistication in these areas, especially in the rapidly growing field of data-driven journalism intended for a general audience as the availability of historical information rapidly balloons. Providing statistical methodology to probabilistically predict competition outcomes faces two main challenges. First, a suitably general modeling approach is necessary to assign probabilities to competitors. Second, the modeling framework must be able to accommodate expert opinion, which is usually available but difficult to fully encapsulate in typical data sets. We overcome these challenges with a combined conditional logistic regression/subjective Bayes approach. To illustrate the method, we re-analyze data from a recent Time.com piece in which the authors attempted to predict the 2019 Best Picture Academy Award winner using standard logistic regression. Towards engaging and educating a broad readership, we discuss strategies to deploy the proposed method via an online application.
翻译:预测选举、体育赛事、娱乐奖项及其他竞赛的结果长久以来吸引着人类想象。随着历史数据信息的迅速膨胀,此类预测在相关领域日益精密,尤其在面向大众的快速增长的数据驱动新闻行业尤为突出。为竞赛结果提供基于概率的统计预测方法面临两大挑战:首先,需要建立足够通用的建模方法为竞争者分配概率;其次,建模框架必须能够纳入专家意见——这类意见通常可得,却难以在典型数据集中完全体现。我们通过结合条件逻辑回归与主观贝叶斯方法克服了这些挑战。为阐释该方法,我们重新分析了近期Time.com文章中用于预测2019年奥斯卡最佳影片得主的标准逻辑回归数据。面向广泛读者群体的参与和教育需求,我们还讨论了通过在线应用部署该方法的策略。