Prediction markets are increasingly used as probability forecasting tools, yet their usefulness depends on calibration, specifically whether a contract trading at 70 cents truly implies a 70% probability. Using 292 million trades across 327,000 binary contracts on Kalshi and Polymarket, this paper shows that calibration is a structured, multidimensional phenomenon. On Kalshi, calibration decomposes into four components (a universal horizon effect, domain-specific biases, domain-by-horizon interactions and a trade-size scale effect) that together explain 87.3% of calibration variance. The dominant pattern is persistent underconfidence in political markets, where prices are chronically compressed toward 50%, and this bias generalises across both exchanges. However, the trade-size scale effect, whereby large trades are associated with amplified underconfidence in politics on Kalshi ($Δ= 0.53$, 95% confidence interval [0.29, 0.75]), does not replicate on Polymarket ($Δ= 0.11$, [-0.15, 0.39]), suggesting platform-specific microstructure. A Bayesian hierarchical model confirms the frequentist decomposition with 96.3% posterior predictive coverage. Consumers of prediction market prices who treat them as face-value probabilities will systematically misinterpret them, and the direction of misinterpretation depends on what is being predicted, when and by whom.
翻译:预测市场日益被用作概率预测工具,但其有效性取决于校准程度,即一份交易价格为70美分的合约是否真正意味着70%的概率。本文通过分析Kalshi和Polymarket平台上327,000份二元合约的2.92亿笔交易数据,揭示了校准是一种具有结构化特征的多维现象。在Kalshi平台上,校准可分解为四个组成部分(普遍性时间效应、领域特异性偏差、领域-时间交互效应和交易规模尺度效应),这些因素共同解释了87.3%的校准方差。最显著的模式体现在政治市场中持续存在的信心不足现象,其价格长期向50%压缩,且这种偏差在两个交易平台具有普遍性。然而,交易规模尺度效应——即Kalshi平台上大额交易与政治市场信心不足的放大效应相关(Δ=0.53,95%置信区间[0.29, 0.75])——在Polymarket平台上未能复现(Δ=0.11,[-0.15, 0.39]),这暗示了平台特定的微观结构特征。贝叶斯分层模型以96.3%的后验预测覆盖率验证了频率主义分解结果。若预测市场价格的使用者将其直接视为面值概率,将会产生系统性误判,而误判的方向取决于预测对象、预测时机及预测主体。