A unique aspect of ColBERT is its use of [MASK] tokens in queries to score documents (query augmentation). Prior work shows [MASK] tokens weighting non-[MASK] query terms, emphasizing certain tokens over others , rather than introducing whole new terms as initially proposed. We begin by demonstrating that a term weighting behavior previously reported for [MASK] tokens in ColBERTv1 holds for ColBERTv2. We then examine the effect of changing the number of [MASK] tokens from zero to up to four times past the query input length used in training, both for first stage retrieval, and for scoring candidates, observing an initial decrease in performance with few [MASK]s, a large increase when enough [MASK]s are added to pad queries to an average length of 32, then a plateau in performance afterwards. Additionally, we compare baseline performance to performance when the query length is extended to 128 tokens, and find that differences are small (e.g., within 1% on various metrics) and generally statistically insignificant, indicating performance does not collapse if ColBERT is presented with more [MASK] tokens than expected.
翻译:ColBERT的一个独特之处在于其在查询中使用[MASK]标记来为文档评分(查询增强)。先前的研究表明,[MASK]标记会对非[MASK]查询词项进行加权,强调某些标记而非其他标记,而非如最初设想那样引入全新的词项。我们首先证明先前在ColBERTv1中报告的关于[MASK]标记的词项加权行为在ColBERTv2中同样成立。接着,我们研究了将[MASK]标记数量从零增加到训练所用查询输入长度四倍的影响,包括对第一阶段检索和候选文档评分的影响。我们观察到,当添加少量[MASK]标记时,性能最初会下降;当添加足够多的[MASK]标记使查询平均长度达到32时,性能会大幅提升;之后性能则趋于平稳。此外,我们将基线性能与查询长度扩展至128个标记时的性能进行了比较,发现差异很小(例如,在各种指标上均在1%以内),且通常不具有统计学显著性。这表明,即使ColBERT接收到比预期更多的[MASK]标记,其性能也不会崩溃。