News spreads rapidly across languages and regions, but translations may lose subtle nuances. We propose a method to align sentences in multilingual news articles using optimal transport, identifying semantically similar content across languages. We apply this method to align more than 140,000 pairs of Bloomberg English and Japanese news articles covering around 3500 stocks in Tokyo exchange over 2012-2024. Aligned sentences are sparser, more interpretable, and exhibit higher semantic similarity. Return scores constructed from aligned sentences show stronger correlations with realized stock returns, and long-short trading strategies based on these alignments achieve 10\% higher Sharpe ratios than analyzing the full text sample.
翻译:新闻在不同语言和地区间快速传播,但翻译过程可能丢失细微的语义差异。本文提出一种基于最优传输的多语言新闻语句对齐方法,用于识别跨语言的语义相似内容。我们将该方法应用于2012-2024年间涵盖东京交易所约3500只股票的超过14万对彭博英文与日文新闻文章。实验表明,对齐后的语句具有更高的稀疏性、可解释性及语义相似度。基于对齐语句构建的收益评分与实际股票收益率呈现更强的相关性,且基于此对齐策略构建的多空交易策略,其夏普比率较全文本样本分析策略提升了10%。