There is increasing interest to work with user generated content in social media, especially textual posts over time. Currently there is no consistent way of segmenting user posts into timelines in a meaningful way that improves the quality and cost of manual annotation. Here we propose a set of methods for segmenting longitudinal user posts into timelines likely to contain interesting moments of change in a user's behaviour, based on their online posting activity. We also propose a novel framework for evaluating timelines and show its applicability in the context of two different social media datasets. Finally, we present a discussion of the linguistic content of highly ranked timelines.
翻译:社交媒体用户生成内容(尤其是随时间变化的文本帖子)的研究日益受到关注。目前缺乏一种既能提升人工标注质量与成本效益、又能对用户帖子进行有意义时段划分的标准化时间线分割方法。本文提出一组基于用户在线发布行为,将纵向用户帖子分割为可能包含行为变化关键节点的时间线的方法;同时创新性地构建了时间线评估框架,并在两个不同社交媒体数据集上验证其适用性。最后,我们对高排名时间线的语言内容进行了深入探讨。