As online dating has become more popular in the past few years, an efficient and effective algorithm to match users is needed. In this project, we proposed a new dating matching algorithm that uses Kendall-Tau distance to measure the similarity between users based on their ranking for items in a list. (e.g., their favourite sports, music, etc.) To increase the performance of the search process, we applied a tree-based searching structure, Cascading Metric Tree (CMT), on this metric. The tree is built on ranked lists from all the users; when a query target and a radius are provided, our algorithm can return users within the radius of the target. We tested the scaling of this searching method on a synthetic dataset by varying list length, population size, and query radius. We observed that the algorithm is able to query the best matching people for the user in a practical time, given reasonable parameters. We also provided potential future improvements that can be made to this algorithm based on the limitations. Finally, we offered more use cases of this search structure on Kendall-Tau distance and new insight into real-world applications of distance search structures.
翻译:随着在线约会在过去几年中日益普及,需要一种高效且有效的算法来匹配用户。在本项目中,我们提出了一种新的约会匹配算法,该算法利用Kendall-Tau距离,根据用户对列表中项目的排名(例如,他们最喜欢的运动、音乐等)来衡量用户之间的相似性。为了提高搜索过程的性能,我们在此度量上应用了一种基于树的搜索结构——级联度量树(CMT)。该树基于所有用户的排名列表构建;当提供查询目标和半径时,我们的算法可以返回目标半径范围内的用户。我们通过改变列表长度、人口规模和查询半径,在合成数据集上测试了这种搜索方法的可扩展性。我们观察到,在给定合理参数的情况下,该算法能够在实际时间内查询到与用户最匹配的人。我们还根据局限性提出了该算法未来可以进行的潜在改进。最后,我们提供了这种搜索结构在Kendall-Tau距离上的更多用例,以及对距离搜索结构在现实世界应用中的新见解。