We investigate the optimization target of Contrast-Consistent Search (CCS), which aims to recover the internal representations of truth of a large language model. We present a new loss function that we call the Midpoint-Displacement (MD) loss function. We demonstrate that for a certain hyper-parameter value this MD loss function leads to a prober with very similar weights to CCS. We further show that this hyper-parameter is not optimal and that with a better hyper-parameter the MD loss function attains a higher test accuracy than CCS.
翻译:我们研究了对比一致性搜索(CCS)的优化目标,该方法旨在恢复大型语言模型内部对真实性的表征。我们提出了一种新的损失函数,称为中点位移(MD)损失函数。我们证明,对于特定的超参数值,该MD损失函数产生的探测模型与CCS具有非常相似的权重。我们进一步表明,该超参数并非最优,通过选择更优的超参数,MD损失函数比CCS获得更高的测试准确率。