Text-based sentiment indicators are widely used to monitor public and market mood, but weekly sentiment series are noisy by construction. A main reason is that the amount of relevant news changes over time and across categories. As a result, some weekly averages are based on many articles, while others rely on only a few. Existing approaches do not explicitly account for changes in data availability when measuring uncertainty. We present a Bayesian state-space framework that turns aggregated news sentiment into a smoothed time series with uncertainty. The model treats each weekly sentiment value as a noisy measurement of an underlying sentiment process, with observation uncertainty scaled by the effective information weight $n_{tj}$: when coverage is high, latent sentiment is anchored more strongly to the observed aggregate; when coverage is low, inference relies more on the latent dynamics and uncertainty increases. Using news data grouped into multiple categories, we find broadly similar latent dynamics across categories, while larger differences appear in observation noise. The framework is designed for descriptive monitoring and can be extended to other text sources where information availability varies over time.
翻译:基于文本的情感指标被广泛用于监测公众与市场情绪,但按周构建的情感序列本质上存在噪声。一个主要原因是相关新闻的数量会随时间及类别发生变化。因此,某些周度平均值基于大量文章计算,而另一些则仅依赖少数几篇。现有方法在衡量不确定性时并未明确考虑数据可用性的变化。本文提出一种贝叶斯状态空间框架,可将聚合新闻情感转化为带有不确定性的平滑时间序列。该模型将每周情感值视为潜在情感过程的噪声测量值,其观测不确定性通过有效信息权重 $n_{tj}$ 进行缩放:当新闻覆盖度高时,潜在情感更紧密地锚定于观测聚合值;当覆盖度低时,推断更依赖于潜在动态且不确定性增大。通过对多类别新闻数据的分析,我们发现不同类别间的潜在动态具有广泛相似性,而观测噪声则呈现较大差异。该框架适用于描述性监测任务,并可扩展至其他信息可用性随时间变化的文本数据源。