PEPT: Expert Finding Meets Personalized Pre-training

Finding appropriate experts is essential in Community Question Answering (CQA) platforms as it enables the effective routing of questions to potential users who can provide relevant answers. The key is to personalized learning expert representations based on their historical answered questions, and accurately matching them with target questions. There have been some preliminary works exploring the usability of PLMs in expert finding, such as pre-training expert or question representations. However, these models usually learn pure text representations of experts from histories, disregarding personalized and fine-grained expert modeling. For alleviating this, we present a personalized pre-training and fine-tuning paradigm, which could effectively learn expert interest and expertise simultaneously. Specifically, in our pre-training framework, we integrate historical answered questions of one expert with one target question, and regard it as a candidate aware expert-level input unit. Then, we fuse expert IDs into the pre-training for guiding the model to model personalized expert representations, which can help capture the unique characteristics and expertise of each individual expert. Additionally, in our pre-training task, we design: 1) a question-level masked language model task to learn the relatedness between histories, enabling the modeling of question-level expert interest; 2) a vote-oriented task to capture question-level expert expertise by predicting the vote score the expert would receive. Through our pre-training framework and tasks, our approach could holistically learn expert representations including interests and expertise. Our method has been extensively evaluated on six real-world CQA datasets, and the experimental results consistently demonstrate the superiority of our approach over competitive baseline methods.

翻译：在社区问答（CQA）平台中，寻找合适的专家至关重要，因为这能够有效地将问题路由到可能提供相关答案的潜在用户。关键在于基于专家历史回答过的问题进行个性化学习其表示，并准确匹配目标问题。已有一些初步工作探索了预训练语言模型（PLM）在专家发现中的可用性，例如预训练专家或问题表示。然而，这些模型通常仅从历史中学习专家的纯文本表示，忽略了专家建模的个性化和细粒度特性。为解决这一问题，我们提出了一种个性化预训练与微调范式，能够同时有效学习专家兴趣和专业知识。具体而言，在我们的预训练框架中，将一位专家的历史回答问题与一个目标问题整合，并将其视为候选感知的专家级输入单元。然后，我们将专家标识符（ID）融入预训练过程，以引导模型学习个性化专家表示，从而捕捉每位专家的独特特征和专业知识。此外，在我们的预训练任务中，我们设计了：1）面向问题的掩码语言模型任务，用于学习历史问题之间的关联性，从而实现对问题级专家兴趣的建模；2）基于投票的任务，通过预测专家将获得的投票分数来捕捉问题级专家专业知识。通过我们的预训练框架和任务，该方法能够全面学习包含兴趣和专业知识的专家表示。我们在六个真实CQA数据集上进行了广泛评估，实验结果一致表明，我们的方法相较于竞争基线方法具有优越性。