We introduce a new method to identify emerging concepts in large text corpora. By analyzing changes in the heatmaps of the underlying embedding space, we are able to detect these concepts with high accuracy shortly after they originate, in turn outperforming common alternatives. We further demonstrate the utility of our approach by analyzing speeches in the U.S. Senate from 1941 to 2015. Our results suggest that the minority party is more active in introducing new concepts into the Senate discourse. We also identify specific concepts that closely correlate with the Senators' racial, ethnic, and gender identities. An implementation of our method is publicly available.
翻译:本文提出了一种识别大规模文本语料库中新兴概念的新方法。通过分析底层嵌入空间热图的变化,我们能够在这些概念出现后不久即以高精度检测到它们,其表现优于常见的替代方法。我们通过分析1941年至2015年美国参议院的演讲,进一步证明了我们方法的实用性。我们的结果表明,少数党在将新概念引入参议院讨论方面更为活跃。我们还识别出与参议员种族、民族和性别身份密切相关的特定概念。我们方法的实现已公开提供。