We introduce Hyperbard, a dataset of diverse relational data representations derived from Shakespeare's plays. Our representations range from simple graphs capturing character co-occurrence in single scenes to hypergraphs encoding complex communication settings and character contributions as hyperedges with edge-specific node weights. By making multiple intuitive representations readily available for experimentation, we facilitate rigorous representation robustness checks in graph learning, graph mining, and network analysis, highlighting the advantages and drawbacks of specific representations. Leveraging the data released in Hyperbard, we demonstrate that many solutions to popular graph mining problems are highly dependent on the representation choice, thus calling current graph curation practices into question. As an homage to our data source, and asserting that science can also be art, we present all our points in the form of a play.
翻译:我们提出了Hyperbard数据集,这是一个源自莎士比亚戏剧的多样化关系数据表示集合。我们的表示范围从捕捉单场戏中角色共现的简单图,到将复杂交流场景和角色贡献编码为带有边特定节点权重的超边的超图。通过提供多种直观的表示形式供实验使用,我们促进了图学习、图挖掘和网络分析中表示的稳健性检查,突出了特定表示的优缺点。利用Hyperbard发布的数据,我们证明许多流行图挖掘问题的解决方案高度依赖于表示的选择,从而对当前的图构建实践提出了质疑。作为对数据源的致敬,并主张科学亦可为艺术,我们以戏剧形式呈现所有论点。