This paper presents an efficient archival framework for exploring and tracking cyberspace large-scale data called Tempo-Spatial Content Delivery Network (TS-CDN). Social media data streams are renewing in time and spatial dimensions. Various types of websites and social networks (i.e., channels, groups, pages, etc.) are considered spatial in cyberspace. Accurate analysis entails encompassing the bulk of data. In TS-CDN by applying the hash function on big data an efficient content delivery network is created. Using hash function rebuffs data redundancy and leads to conclude unique data archive in large-scale. This framework based on entered query allows for apparent monitoring and exploring data in tempo-spatial dimension based on TF-IDF score. Also by conformance from i18n standard, the Unicode problem has been dissolved. For evaluation of TS-CDN framework, a dataset from Telegram news channels from March 23, 2020 (1399-01-01), to September 21, 2020 (1399-06-31) on topics including Coronavirus (COVID-19), vaccine, school reopening, flood, earthquake, justice shares, petroleum, and quarantine exploited. By applying hash on Telegram dataset in the mentioned time interval, a significant reduction in media files such as 39.8% for videos (from 79.5 GB to 47.8 GB), and 10% for images (from 4 GB to 3.6 GB) occurred. TS-CDN infrastructure in a web-based approach has been presented as a service-oriented system. Experiments conducted on enormous time series data, including different spatial dimensions (i.e., Khabare Fouri, Khabarhaye Fouri, Akhbare Rouze Iran, and Akhbare Rasmi Telegram news channels), demonstrate the efficiency and applicability of the implemented TS-CDN framework.
翻译:本文提出了一种名为时空内容分发网络(TS-CDN)的高效存档框架,用于探索和追踪网络空间大规模数据。社交媒体数据流在时间和空间维度上不断更新。各类网站和社交网络(如频道、群组、页面等)被视为网络空间中的空间维度。精确分析需要涵盖海量数据。在TS-CDN中,通过对大数据应用哈希函数,构建了一个高效的内容分发网络。哈希函数的使用消除了数据冗余,从而在大规模场景下实现了唯一数据存档。该框架基于输入的查询,根据TF-IDF得分实现对时空维度数据的高效监控与探索。同时,通过遵循i18n标准,解决了Unicode编码问题。为评估TS-CDN框架,我们使用了2020年3月23日(1399-01-01)至2020年9月21日(1399-06-31)期间Telegram新闻频道的数据集,主题涵盖新冠病毒、疫苗、学校复课、洪水、地震、公平股份、石油及封控。在该时间段内对Telegram数据集应用哈希后,媒体文件显著减少:视频减少39.8%(从79.5 GB降至47.8 GB),图像减少10%(从4 GB降至3.6 GB)。基于Web的TS-CDN基础设施以面向服务系统的形式呈现。在包含不同空间维度(如Khabare Fouri、Khabarhaye Fouri、Akhbare Rouze Iran和Akhbare Rasmi Telegram新闻频道)的巨量时间序列数据上开展的实验,验证了所实现TS-CDN框架的高效性与适用性。