Accurately predicting sports viewership is crucial for optimizing ad sales and revenue forecasting. Social media platforms, such as Reddit, provide a wealth of user-generated content that reflects audience engagement and interest. In this study, we propose a regression-based approach to predict sports viewership using social media metrics, including post counts, comments, scores, and sentiment analysis from TextBlob and VADER. Through iterative improvements, such as focusing on major sports subreddits, incorporating categorical features, and handling outliers by sport, the model achieved an $R^2$ of 0.99, a Mean Absolute Error (MAE) of 1.27 million viewers, and a Root Mean Squared Error (RMSE) of 2.33 million viewers on the full dataset. These results demonstrate the model's ability to accurately capture patterns in audience behavior, offering significant potential for pre-event revenue forecasting and targeted advertising strategies.
翻译:准确预测体育赛事收视率对于优化广告销售和收入预测至关重要。诸如Reddit等社交媒体平台提供了丰富的用户生成内容,这些内容反映了观众的参与度和兴趣。本研究提出一种基于回归的方法,利用社交媒体指标预测体育赛事收视率,这些指标包括帖子数量、评论数、评分以及来自TextBlob和VADER的情感分析结果。通过迭代改进,例如聚焦于主要体育赛事子版块、纳入分类特征以及按运动项目处理异常值,该模型在完整数据集上实现了$R^2$为0.99、平均绝对误差(MAE)为127万观众、均方根误差(RMSE)为233万观众的性能。这些结果表明该模型能够准确捕捉观众行为模式,为赛前收入预测和定向广告策略提供了重要潜力。