In this thesis, we propose an approach to identity resolution across social media platforms using the topics, sentiments, and timings of the posts on the platforms. After collecting the public posts of around 5000 profiles from Disqus and Twitter, we analyze their posts to match their profiles across the two platforms. We pursue both temporal and non-temporal methods in our analysis. While neither approach proves definitively superior, the temporal approach generally performs better. We found that the temporal window size influences results more than the shifting amount. On the other hand, our sentiment analysis shows that the inclusion of sentiment makes little difference, probably due to flawed data extraction methods. We also experimented with a distance-based reward-and-punishment-focused scoring model, which achieved an accuracy of 24.198% and an average rank of 158.217 out of 2525 in our collected corpus. Future work includes refining sentiment analysis by evaluating sentiments per topic, extending temporal analysis with additional phases, and improving the scoring model through weight adjustments and modified rewards.
翻译:本论文提出了一种跨社交媒体平台的身份解析方法,该方法利用用户在平台所发帖子的主题、情感倾向及发布时间信息。通过采集约5000个Disqus与Twitter账号的公开帖子,我们分析其内容以实现跨平台账号匹配。研究中同时采用了时序与非时序分析方法。虽然两种方法均未表现出绝对优势,但时序方法整体性能更优。我们发现时间窗口大小对结果的影响大于滑动步长。另一方面,情感分析结果表明引入情感特征对效果提升有限,这可能是由于数据提取方法存在缺陷。我们还尝试了基于距离的奖惩评分模型,在自建语料库(共2525个样本)中取得了24.198%的准确率,平均排名为158.217。未来工作包括:通过按主题评估情感来改进情感分析方法,通过增加分析阶段扩展时序分析,以及通过权重调整和奖励机制优化来完善评分模型。