On the evolution of data breach reporting patterns and frequency in the United States: a cross-state analysis

Understanding the emergence of data breaches is crucial for cyber insurance. However, analyses of data breach frequency trends in the current literature lead to contradictory conclusions. We put forward that those discrepancies may be (at least partially) due to inconsistent data collection standards, as well as reporting patterns, over time and space. We set out to carefully control both. In this paper, we conduct a joint analysis of state Attorneys General's publications on data breaches across eight states (namely, California, Delaware, Indiana, Maine, Montana, North Dakota, Oregon, and Washington), all of which are subject to established data collection standards-namely, state data breach (mandatory) notification laws. Thanks to our explicit recognition of these notification laws, we are capable of modelling frequency of breaches in a consistent and comparable way over time. Hence, we are able to isolate and capture the complexities of reporting patterns, adequately estimate IBNRs, and yield a highly reliable assessment of historical frequency trends in data breaches. Our analysis also provides a comprehensive comparison of data breach frequency across the eight U.S. states, extending knowledge on state-specific differences in cyber risk, which has not been extensively discussed in the current literature. Furthermore, we uncover novel features not previously discussed in the literature, such as differences in cyber risk frequency trends between large and small data breaches. Overall, we find that the reporting delays are lengthening. We also elicit commonalities and heterogeneities in reporting patterns across states, severity levels, and time periods. After adequately estimating IBNRs, we find that frequency is relatively stable before 2020 and increasing after 2020. This is consistent across states. Implications of our findings for cyber insurance are discussed.

翻译：理解数据泄露事件的出现规律对网络保险至关重要。然而，现有文献中对数据泄露频率趋势的分析得出了相互矛盾的结论。我们认为，这些差异可能（至少部分）源于数据收集标准以及报告模式在时间和空间上的不一致性。本研究旨在对这两方面进行严格控制。本文联合分析了美国八个州（即加利福尼亚州、特拉华州、印第安纳州、缅因州、蒙大拿州、北达科他州、俄勒冈州和华盛顿州）总检察长办公室发布的数据泄露公告，这些州均遵循既定的数据收集标准——即各州的数据泄露（强制性）通知法律。由于我们明确识别了这些通知法律，能够以一致且可比较的方式对数据泄露频率随时间的变化进行建模。因此，我们得以分离并捕捉报告模式的复杂性，充分估计已发生未报告（IBNR）事件，并对数据泄露的历史频率趋势做出高度可靠的评估。我们的分析还提供了美国八个州之间数据泄露频率的全面比较，拓展了关于网络风险州际差异的认识，而现有文献对此尚未充分讨论。此外，我们发现了先前文献未涉及的新特征，例如大规模与小规模数据泄露在网络风险频率趋势上的差异。总体而言，我们发现报告延迟正在延长。我们还揭示了各州之间、不同严重程度以及不同时期报告模式的共性与异质性。在充分估计IBNR后，我们发现数据泄露频率在2020年前相对稳定，而在2020年后呈上升趋势。这一趋势在各州间表现一致。本文最后讨论了研究结果对网络保险的启示。