In the era of large language models (LLMs), supervised neural methods remain the state-of-the-art (SOTA) for Coreference Resolution. Yet, their full potential is underexplored, particularly in incremental clustering, which faces the critical challenge of balancing efficiency with performance for long texts. To address the limitation, we propose \textbf{MEIC-DT}, a novel dual-threshold, memory-efficient incremental clustering approach based on a lightweight Transformer. MEIC-DT features a dual-threshold constraint mechanism designed to precisely control the Transformer's input scale within a predefined memory budget. This mechanism incorporates a Statistics-Aware Eviction Strategy (\textbf{SAES}), which utilizes distinct statistical profiles from the training and inference phases for intelligent cache management. Furthermore, we introduce an Internal Regularization Policy (\textbf{IRP}) that strategically condenses clusters by selecting the most representative mentions, thereby preserving semantic integrity. Extensive experiments on common benchmarks demonstrate that MEIC-DT achieves highly competitive coreference performance under stringent memory constraints.
翻译:在大语言模型(LLM)时代,监督式神经方法在指代消解任务中仍保持着最先进的性能。然而,其全部潜力尚未得到充分挖掘,尤其是在增量聚类方面,面临着在长文本处理中平衡效率与性能的关键挑战。为解决这一局限,我们提出了**MEIC-DT**,一种基于轻量级Transformer的新型双阈值、内存高效增量聚类方法。MEIC-DT的核心是一个双阈值约束机制,旨在将Transformer的输入规模精确控制在预定义的内存预算内。该机制包含一个统计感知驱逐策略(**SAES**),该策略利用训练和推理阶段的不同统计特征进行智能缓存管理。此外,我们引入了一种内部正则化策略(**IRP**),通过选择最具代表性的提及来策略性地压缩聚类簇,从而保持语义完整性。在多个常用基准测试上的广泛实验表明,MEIC-DT在严格的内存约束下实现了极具竞争力的指代消解性能。