Our goal is to learn about the political interests and preferences of the Members of Parliament by mining their parliamentary activity, in order to develop a recommendation/filtering system that, given a stream of documents to be distributed among them, is able to decide which documents should receive each Member of Parliament. We propose to use positive unlabeled learning to tackle this problem, because we only have information about relevant documents (the own interventions of each Member of Parliament in the debates) but not about irrelevant documents, so that we cannot use standard binary classifiers trained with positive and negative examples. We have also developed a new algorithm of this type, which compares favourably with: a) the baseline approach assuming that all the interventions of other Members of Parliament are irrelevant, b) another well-known positive unlabeled learning method and c) an approach based on information retrieval methods that matches documents and legislators' representations. The experiments have been carried out with data from the regional Andalusian Parliament at Spain.
翻译:本文旨在通过挖掘议会议员的政治活动,学习其政治兴趣与偏好,进而开发一套推荐/过滤系统。该系统能够针对待分发给各议员的文档流,自动判定每份文档应送达哪位议员。我们提出采用正无标签学习来解决该问题:由于仅能获取相关文档信息(议员在辩论中的个人发言记录),而缺乏不相关文档样本,因此无法使用基于正负例训练的标准二元分类器。我们另开发了一种新型正无标签学习算法,其性能显著优于:a) 将所有其他议员发言视为不相关文档的基线方法;b) 另一种著名的正无标签学习方法;c) 基于信息检索技术实现文档与议员表征匹配的方法。实验数据来源于西班牙安达卢西亚地方议会。