Unveiling Text in Challenging Stone Inscriptions: A Character-Context-Aware Patching Strategy for Binarization

Binarization is a popular first step towards text extraction in historical artifacts. Stone inscription images pose severe challenges for binarization due to poor contrast between etched characters and the stone background, non-uniform surface degradation, distracting artifacts, and highly variable text density and layouts. These conditions frequently cause existing binarization techniques to fail and struggle to isolate coherent character regions. Many approaches sub-divide the image into patches to improve text fragment resolution and improve binarization performance. With this in mind, we present a robust and adaptive patching strategy to binarize challenging Indic inscriptions. The patches from our approach are used to train an Attention U-Net for binarization. The attention mechanism allows the model to focus on subtle structural cues, while our dynamic sampling and patch selection method ensures that the model learns to overcome surface noise and layout irregularities. We also introduce a carefully annotated, pixel-precise dataset of Indic stone inscriptions at the character-fragment level. We demonstrate that our novel patching mechanism significantly boosts binarization performance across classical and deep learning baselines. Despite training only on single script Indic dataset, our model exhibits strong zero-shot generalization to other Indic and non-indic scripts, highlighting its robustness and script-agnostic generalization capabilities. By producing clean, structured representations of inscription content, our method lays the foundation for downstream tasks such as script identification, OCR, and historical text analysis. Project page: https://ihdia.iiit.ac.in/shilalekhya-binarization/

翻译：二值化是历史文物文本提取的常用预处理步骤。石刻图像因刻字与石质背景对比度低、表面退化不均匀、干扰伪影多以及文本密度和布局高度多变等问题，给二值化带来严峻挑战。这些条件常导致现有二值化技术失效，难以分离出连贯的字符区域。许多方法将图像细分为小块以提高文本片段分辨率并改善二值化性能。基于此，我们提出一种鲁棒的自适应分块策略，用于对具有挑战性的印度系文字石刻进行二值化处理。本方法生成的图像块被用于训练注意力U-Net进行二值化。注意力机制使模型能够聚焦于细微的结构特征，而我们的动态采样与分块选择方法确保模型学会克服表面噪声和布局不规则性问题。我们还引入了一个在字符片段级别精细标注、像素级精确的印度系石刻数据集。实验证明，我们新颖的分块机制显著提升了经典方法和深度学习基线模型的二值化性能。尽管仅在单一文字的印度系数据集上训练，我们的模型对其他印度系及非印度系文字展现出强大的零样本泛化能力，凸显了其鲁棒性和与文字无关的泛化特性。通过生成清晰、结构化的铭文内容表征，本方法为下游任务如文字识别、光学字符识别和历史文本分析奠定了基础。项目页面：https://ihdia.iiit.ac.in/shilalekhya-binarization/