Modeling Sparse and Bursty Vulnerability Sightings: Forecasting Under Data Constraints

Understanding and anticipating vulnerability-related activity is a major challenge in cyber threat intelligence. This work investigates whether vulnerability sightings, such as proof-of-concept releases, detection templates, or online discussions, can be forecast over time. Building on our earlier work on VLAI, a transformer-based model that predicts vulnerability severity from textual descriptions, we examine whether severity scores can improve time-series forecasting as exogenous variables. We evaluate several approaches for short-term forecasting of sightings per vulnerability. First, we test SARIMAX models with and without log(x+1) transformations and VLAI-derived severity inputs. Although these adjustments provide limited improvements, SARIMAX remains poorly suited to sparse, short, and bursty vulnerability data. In practice, forecasts often produce overly wide confidence intervals and sometimes unrealistic negative values. To better capture the discrete and event-driven nature of sightings, we then explore count-based methods such as Poisson regression. Early results show that these models produce more stable and interpretable forecasts, especially when sightings are aggregated weekly. We also discuss simpler operational alternatives, including exponential decay functions for short forecasting horizons, to estimate future activity without requiring long historical series. Overall, this study highlights both the potential and the limitations of forecasting rare and bursty cyber events, and provides practical guidance for integrating predictive analytics into vulnerability intelligence workflows.

翻译：理解并预测与漏洞相关的活动是网络威胁情报中的重大挑战。本研究探讨了漏洞观测（如概念验证发布、检测模板或在线讨论）是否能够随时间进行预测。基于我们先前在VLAI（一种基于Transformer的模型，可根据文本描述预测漏洞严重性）上的工作，我们检验了严重性评分能否作为外生变量改进时间序列预测。我们评估了几种针对每个漏洞观测进行短期预测的方法。首先，我们测试了是否采用log(x+1)变换及VLAI导出严重性输入的SARIMAX模型。尽管这些调整带来了有限的改进，但SARIMAX仍然不适于稀疏、短周期且突发性强的漏洞数据。在实践中，预测常产生过宽的置信区间，有时甚至出现不切实际的负值。为更好捕捉观测的离散性和事件驱动特征，我们进而探索了计数方法，如泊松回归。初步结果显示，这些模型能生成更稳定且可解释的预测，尤其是在按周聚合观测数据时。我们还讨论了更简单的操作替代方案，包括针对短预测期使用指数衰减函数，以在不依赖长历史序列的情况下估算未来活动。总体而言，本研究凸显了预测稀有突发性网络事件的潜力与局限，并为将预测分析整合至漏洞情报工作流提供了实践指导。