We propose Textiverse, a big data approach for mining geotagged timestamped textual data on a map, such as for Twitter feeds, crime reports, or restaurant reviews. We use a scalable data management pipeline that extracts keyphrases from online databases in parallel. We speed up this time-consuming step so that it outpaces the content creation rate of popular social media. The result is presented in a web-based interface that integrates with Google Maps to visualize textual content of massive scale. The visual design is based on aggregating spatial regions into discrete sites and rendering each such site as a circular tag cloud. To demonstrate the intended use of our technique, we first show how it can be used to characterize the U.S.\ National Science Foundation funding status based on all 489,151 awards. We then apply the same technique on visually representing a more spatially scattered and linguistically informal dataset: 1.2 million Twitter posts about the Android mobile operating system.
翻译:我们提出Textiverse,这是一种用于在地图上挖掘带有地理标记和时间戳的文本数据(如推特动态、犯罪报告或餐厅评论)的大数据方法。我们采用可扩展的数据管理管道,并行从在线数据库中提取关键短语。我们加速这一耗时步骤,使其超过流行社交媒体内容生成速率。结果通过集成Google Maps的网页界面呈现,以可视化大规模文本内容。视觉设计基于将空间区域聚合为离散站点,并将每个站点渲染为圆形标签云。为展示该技术的预期用途,我们首先展示了如何利用它基于全部489,151项资助来表征美国国家科学基金会资助状况,随后将相同技术应用于可视化表现空间分布更分散且语言更非正式的数据集:120万条关于Android移动操作系统的推特帖子。