Most genes are part of larger families of evolutionary related genes. The history of gene families typically involves duplications and losses of genes as well as horizontal transfers into other organisms. The reconstruction of detailed gene family histories, i.e., the precise dating of evolutionary events relative to phylogenetic tree of the underlying species has remained a challenging topic despite their importance as a basis for detailed investigations into adaptation and functional evolution of individual members of the gene family. The identification of orthologs, moreover, is a particularly important subproblem of the more general setting considered here. In the last few years, an extensive body of mathematical results has appeared that tightly links orthology, a formal notion of best matches among genes, and horizontal gene transfer. The purpose of this chapter is the broadly outline some of the key mathematical insights and to discuss their implication for practical applications. In particular, we focus on tree-free methods, i.e., methods to infer orthology or horizontal gene transfer as well as gene trees, species trees and reconciliations between them without using \emph{a priori} knowledge of the underlying trees or statistical models for the inference of phylogenetic trees. Instead, the initial step aims to extract binary relations among genes.
翻译:大多数基因属于更大的进化相关基因家族。基因家族的历史通常涉及基因的复制、丢失以及向其他生物体的水平转移。重建详细的基因家族历史(即相对于基础物种系统发育树的进化事件精确定年)尽管是深入探究基因家族成员适应性和功能进化的基础,但至今仍是一个具有挑战性的课题。此外,在本文讨论的更一般性框架下,直系同源基因的识别是一个尤为重要的子问题。近年来,涌现了大量数学研究成果,紧密关联了直系同源、最佳基因匹配形式化概念与水平基因转移。本章旨在概述其中若干关键数学见解,并探讨其对实际应用的意义。我们特别关注无树方法,即无需预先知道基础树结构或用于推断系统发育树的统计模型,即可推断直系同源、水平基因转移以及基因树、物种树及其协调关系的方法。相反,初始步骤旨在提取基因间的二元关系。