In this study, we delve into the dynamics of Wordle using data analysis and machine learning. Our analysis initially focused on the correlation between the date and the number of submitted results. Due to initial popularity bias, we modeled stable data using an ARIMAX model with coefficient values of 9, 0, 2, and weekdays/weekends as the exogenous variable. We found no significant relationship between word attributes and hard mode results. To predict word difficulty, we employed a Backpropagation Neural Network, overcoming overfitting via feature engineering. We also used K-means clustering, optimized at five clusters, to categorize word difficulty numerically. Our findings indicate that on March 1st, 2023, around 12,884 results will be submitted and the word "eerie" averages 4.8 attempts, falling into the hardest difficulty cluster. We further examined the percentage of loyal players and their propensity to undertake daily challenges. Our models underwent rigorous sensitivity analyses, including ADF, ACF, PACF tests, and cross-validation, confirming their robustness. Overall, our study provides a predictive framework for Wordle gameplay based on date or a given five-letter word. Results have been summarized and submitted to the Puzzle Editor of the New York Times.
翻译:本研究利用数据分析和机器学习深入探讨了Wordle游戏的动态特征。我们首先分析了日期与提交结果数量之间的相关性。由于初始流行度偏差,我们采用自回归积分滑动平均模型(ARIMAX模型,系数值为9、0、2,并将工作日/周末作为外生变量)对稳定数据进行建模,发现单词属性与困难模式结果之间无显著关联。为预测单词难度,我们采用反向传播神经网络,通过特征工程克服过拟合问题;同时使用K-means聚类(优化至五类)对单词难度进行数值分类。研究结果显示,2023年3月1日预计将有约12,884个结果提交,单词"eerie"的平均尝试次数为4.8次,属于最高难度类别。我们进一步分析了忠实玩家比例及其完成每日挑战的倾向性。所有模型均经过严格的敏感性分析(包括ADF检验、ACF检验、PACF检验及交叉验证),验证了其鲁棒性。总体而言,本研究基于日期或给定的五字母单词,为Wordle游戏提供了预测框架。研究结果已汇总并提交至《纽约时报》谜题编辑。