With historic misses in the 2016 and 2020 US Presidential elections, interest in measuring polling errors has increased. The most common method for measuring directional errors and non-sampling excess variability during a postmortem for an election is by assessing the difference between the poll result and election result for polls conducted within a few days of the day of the election. Analyzing such polling error data is notoriously difficult with typical models being extremely sensitive to the time between the poll and the election. We leverage hidden Markov models traditionally used for election forecasting to flexibly capture time-varying preferences and treat the election result as a peak at the typically hidden Markovian process. Our results are much less sensitive to the choice of time window, avoid conflating shifting preferences with polling error, and are more interpretable despite a highly flexible model. We demonstrate these results with data on polls from the 2004 through 2020 US Presidential elections and 1992 through 2020 US Senate elections, concluding that previously reported estimates of bias in Presidential elections were too extreme by 10\%, estimated bias in Senatorial elections was too extreme by 25\%, and excess variability estimates were also too large.
翻译:随着2016年和2020年美国大选出现历史性失误,对民调误差的衡量兴趣日益增加。衡量选举事后分析中方向性误差和非抽样超额变异性的最常用方法,是评估选举日前数日内进行的民调结果与选举结果之间的差异。分析此类民调误差数据通常极为困难,因为典型模型对民调时间与选举日之间的间隔极其敏感。我们利用传统上用于选举预测的隐马尔可夫模型,灵活捕捉时变偏好,并将选举结果视为通常隐马尔可夫过程中的一个峰值。我们的结果对时间窗口的选择敏感度大大降低,避免了将偏好变化与民调误差相混淆,并且尽管模型高度灵活,仍具有更强的可解释性。我们利用2004年至2020年美国总统选举以及1992年至2020年美国参议院选举的民调数据展示了这些结果,结论表明:先前报告的总统选举偏差估计被夸大了10%,参议院选举的偏差估计被夸大了25%,超额变异性估计同样被高估。