We introduce a retrieval approach leveraging Support Vector Regression (SVR) ensembles, bootstrap aggregation (bagging), and embedding spaces on the German Dataset for Legal Information Retrieval (GerDaLIR). By conceptualizing the retrieval task in terms of multiple binary needle-in-a-haystack subtasks, we show improved recall over the baselines (0.849 > 0.803 | 0.829) using our voting ensemble, suggesting promising initial results, without training or fine-tuning any deep learning models. Our approach holds potential for further enhancement, particularly through refining the encoding models and optimizing hyperparameters.
翻译:我们提出了一种检索方法,该方法在德国法律信息检索数据集(GerDaLIR)上,结合了支持向量回归(SVR)集成、自助聚合(装袋法)以及嵌入空间。通过将检索任务概念化为多个“草堆寻针”式的二元子任务,我们展示了使用投票集成方法相较于基线在召回率上的提升(0.849 > 0.803 | 0.829),这表明了无需训练或微调任何深度学习模型即可获得有前景的初步结果。我们的方法具有进一步改进的潜力,特别是通过优化编码模型和超参数来实现。