As online dating has become more popular in the past few years, an efficient and effective algorithm to match users is needed. In this project, we proposed a new dating matching algorithm that uses Kendall-Tau distance to measure the similarity between users based on their ranking for items in a list. (e.g., their favourite sports, music, etc.) To increase the performance of the search process, we applied a tree-based searching structure, Cascading Metric Tree (CMT), on this metric. The tree is built on ranked lists from all the users; when a query target and a radius are provided, our algorithm can return users within the radius of the target. We tested the scaling of this searching method on a synthetic dataset by varying list length, population size, and query radius. We observed that the algorithm is able to query the best matching people for the user in a practical time, given reasonable parameters. We also provided potential future improvements that can be made to this algorithm based on the limitations. Finally, we offered more use cases of this search structure on Kendall-Tau distance and new insight into real-world applications of distance search structures.
翻译:随着在线约会在过去几年日益普及,需要一种高效且有效的用户匹配算法。在本项目中,我们提出了一种新的约会匹配算法,该算法利用肯德尔等级相关系数距离,根据用户对列表项(例如,他们喜爱的运动、音乐等)的排名来度量用户之间的相似性。为了提升搜索过程的性能,我们在此度量基础上应用了一种基于树的搜索结构——级联度量树。该树基于所有用户的排名列表构建;当提供查询目标及半径时,我们的算法能够返回目标半径范围内的所有用户。我们通过改变列表长度、总体规模以及查询半径,在合成数据集上测试了此搜索方法的扩展性。实验观察到,在给定合理参数的情况下,该算法能够在实用时间内为用户查询到最佳匹配对象。我们还基于算法的局限性提出了未来可能的改进方向。最后,我们阐述了该搜索结构在肯德尔等级相关系数距离上的更多应用场景,并为距离搜索结构在实际应用中的新见解提供了展望。