Toward the Automated Construction of Probabilistic Knowledge Graphs for the Maritime Domain

International maritime crime is becoming increasingly sophisticated, often associated with wider criminal networks. Detecting maritime threats by means of fusing data purely related to physical movement (i.e., those generated by physical sensors, or hard data) is not sufficient. This has led to research and development efforts aimed at combining hard data with other types of data (especially human-generated or soft data). Existing work often assumes that input soft data is available in a structured format, or is focused on extracting certain relevant entities or concepts to accompany or annotate hard data. Much less attention has been given to extracting the rich knowledge about the situations of interest implicitly embedded in the large amount of soft data existing in unstructured formats (such as intelligence reports and news articles). In order to exploit the potentially useful and rich information from such sources, it is necessary to extract not only the relevant entities and concepts but also their semantic relations, together with the uncertainty associated with the extracted knowledge (i.e., in the form of probabilistic knowledge graphs). This will increase the accuracy of and confidence in, the extracted knowledge and facilitate subsequent reasoning and learning. To this end, we propose Maritime DeepDive, an initial prototype for the automated construction of probabilistic knowledge graphs from natural language data for the maritime domain. In this paper, we report on the current implementation of Maritime DeepDive, together with preliminary results on extracting probabilistic events from maritime piracy incidents. This pipeline was evaluated on a manually crafted gold standard, yielding promising results.

翻译：国际海事犯罪正变得越来越复杂，且常与更广泛的犯罪网络相关联。仅凭与物理运动纯相关的数据（即物理传感器生成的数据，或称硬数据）进行融合来检测海事威胁是不够的。这促使研究人员致力于将硬数据与其他类型数据（尤其是人类生成的数据，或称软数据）相结合。现有工作通常假设输入的软数据以结构化格式可用，或侧重于提取某些相关实体或概念以补充或注释硬数据。然而，对于从存在于非结构化格式（如情报报告和新闻文章）的大量软数据中隐式嵌入的有关感兴趣情境的丰富知识的提取，关注度远低于前述方向。为了利用此类来源中潜在的有用且丰富的信息，不仅需要提取相关实体和概念，还需要提取它们之间的语义关系，以及所提取知识的不确定性（即以概率知识图谱的形式）。这将提高所提取知识的准确性和置信度，并促进后续的推理与学习。为此，我们提出了Maritime DeepDive，一个用于从自然语言数据中自动构建海事领域概率知识图谱的初始原型系统。本文报告了Maritime DeepDive的当前实现情况，以及从海上 piracy 事件中提取概率事件的初步结果。该流水线在人工构建的金标准数据集上进行了评估，取得了令人满意的结果。