In this study, we present a generalizable workflow to identify documents in a historic language with a nonstandard language and script combination, Armeno-Turkish. We introduce the task of detecting distinct patterns of multilinguality based on the frequency of structured language alternations within a document.
翻译:本研究提出了一种可推广的工作流程,用于识别具有非标准语言与文字组合的历史语言文献——亚美尼亚-土耳其语。我们基于文档内结构化语言交替的频率,引入了检测多语言性特殊模式的任务。