Admixed populations arise when two or more ancestral populations interbreed. As a result of this admixture, the genome of admixed populations is defined by tracts of variable size inherited from these parental groups and has particular genetic features that provide valuable information about their demographic history. Diverse methods can be used to derive the ancestry apportionment of admixed individuals, and such inferences can be leveraged for the discovery of genetic loci associated with diseases and traits, therefore having important biomedical implications.
1. Genetic Admixture
Admixed populations are the result of gene flow between reproductively isolated groups, owing to events that have occurred throughout human history, including migratory events, the discovery of new territories, or the slave trade. As a result of the intermixture and recombination, over time, the genomes of individuals in the hybrid population will contain a mosaic of ancestries from different population sources in their chromosomes. The length of the chromosome segments inherited from the different ancestral populations will be proportional to the time elapsed since the admixture event. These tracts shorten over the generations by the meiotic recombination process, so that the most recently admixed populations, such as the Canary Islanders in Spain or the Latino populations, would retain longer ancestral tracts, while the populations that mixed more distantly in time, such as the Uyghur in China, would harbor shorter ancestry segments in their chromosomes [1,2][1][2].
As such, the admixture proportions and the elapsed time since the admixture event can be inferred based on linkage disequilibrium (LD) [3,4][3][4]. When two distant populations interbreed, the admixture linkage disequilibrium (ALD) can be generated among loci with different allelic frequencies in the ancestral populations, leading to a linkage between markers that were previously unlinked. During the first generations since the admixture, ALD is expected to experience a rapid decay between distant loci, while it would be maintained between closer positions and can be detected after generations [5]. Additionally, the ALD dynamics of decay are also influenced by the admixture model. For example, a greater drop of ALD and a faster length decrease in the ancestral chromosomal segments are expected for those populations that have been formed by a single mixing event, compared with admixtures maintained throughout generations [1,6][1][6].
2. Estimation of Genetic Ancestry: Global and Local Ancestry
Global ancestry (GA) is the fraction of genomic ancestry from each admixed individual that can be ascribed to each of the ancestral populations contributing to the recently admixed population (
Figure 1A). The estimate of GA can be obtained using different approaches. Some of the most popular methods are based on probabilistic models using genotype data, assuming that populations are in Hardy–Weinberg equilibrium and considering complete linkage equilibrium for all loci considered for the estimation, such as STRUCTURE [18,19] and ADMIXTURE [20,21]. Alternative approaches that allow the estimation of the ancestry proportions are based on principal component decompositions, such as ipPCA [22], and on the study of LD decay curves, such as ALDER [3]. A). The estimate of GA can be obtained using different approaches. Some of the most popular methods are based on probabilistic models using genotype data, assuming that populations are in Hardy–Weinberg equilibrium and considering complete linkage equilibrium for all loci considered for the estimation, such as STRUCTURE [7][8] and ADMIXTURE [9][10]. Alternative approaches that allow the estimation of the ancestry proportions are based on principal component decompositions, such as ipPCA [11], and on the study of LD decay curves, such as ALDER [3].
) genetic ancestries in a recently admixed population with three ancestral populations. The proportion of each of the ancestral populations is represented by the colors yellow, blue, and purple.
Local ancestry (LA) is a term commonly used to refer to the ancestry in each of the chromosome blocks, also known as ancestral tracks, in recently admixed individuals (
B). For this, the number of copies derived of each ancestral population, in each genomic position, could be inferred per individual (from zero to two copies). Thus, GA can also be obtained by summarizing LA across the individual genomes. Multiple estimators have been developed to infer LA (
Most common methods to estimate local genetic ancestry.
SOFTWARE |
Algorithm |
Background LD |
Phasing Requirement |
Genetic Map |
Physical Map |
Number of Ancestral Populations |
Reference |
[ |
33 | ] | [ | 12 | ] |
EILA |
k-means |
No |
Unphased |
No |
Yes |
2 or 3 |
[34] | [13] |
ELAI |
Two layers HMM |
Yes |
Phased/Unphased | a |
No |
No |
≥2 |
[35] | [14] |
HAPMIX |
HMM |
Yes |
Phased /Unphased | b |
Yes |
No |
2 |
[36] | [15] |
LAMP-LD |
HMM |
Yes |
Phased/Unphased | b |
No |
Yes |
2, 3 or 5 |
[37] | [16] |
Loter |
Single layer HMM |
No |
Phased |
No |
No |
≥2 |
[23] | [17] |
PCAdmix |
HMM and local PCA |
No |
Phased |
Optional |
Optional |
≥2 |
[25] | [18] |
RFMIX |
CRF |
No |
Phased |
Yes |
No |
≥2 |
[24] | [19] |
SABER + |
HMM |
Yes |
Phased |
No |
No |
2–4 |
[38,39] | [20][21] |
SEQMIX |
HMM |
No |
Unphased |
Yes |
No |
2 |
[40] | [22] |
SupportMix |
SVM |
No |
Phased |
Yes |
No |
≥2 |
[26] | [23] |
Phased and unphased data are allowed for ancestral and admixed populations.
Phased data are needed for the ancestral populations and unphased data for the admixed population. CRF (Conditional Random Field), HMM (Hidden Markov Model), LD (linkage disequilibrium), PCA (Principal Component Analysis), SVM (Support Vector Machines).
The use of genotyping microarrays has also led to the development of improved methods to infer LA, such as LAMP-LD [16], RFMix [19], and HAPMIX [15], among others ( ). Compared to the previous methods that were designed to deal with AIMs, these other algorithms rely on denser sets of genetic markers (retaining LD) that allow one to obtain a higher resolution in estimating LA, most of them based on hidden Markov models [14][16].
In order to identify the optimal approach for each scenario, benchmarking the different algorithms and reference panels is necessary. Previous reviews have compared the characteristics and effectiveness of local ancestry estimators [24][25][26][27][28], suggesting a few main aspects to consider: (1) the prior requirements of each estimator, and (2) the inherent features of the target population itself. shows the main characteristics of the most common methods to estimate local ancestry.
Regarding the necessary requirements for the use of each estimator, it must be considered, for example, whether a phasing step is needed prior to ancestry estimation. This step is crucial for an accurate estimate of ancestry and is closely linked to the density of available markers [29].
3.1. Definition
The distribution of allelic frequencies in recently admixed populations is closely related to those frequencies found in their ancestral populations [30][31]. When these ancestral populations have marked differences in the susceptibility to a disease, admixture mapping studies, also known as mapping by admixture linkage disequilibrium (MALD) studies, can be performed to reveal genetic loci harboring variants underlying such differences between population groups [32]. Admixture mapping studies aim to correlate LA with a trait of interest in recently admixed populations in which ALD is still detectable, under the hypothesis that variants associated with increased disease risk will be found in chromosomal fragments inherited from one of the parental populations [33][34]. Thus, an increment (or decrease) in the proportion of the ancestry associated with the trait of interest will be expected in these chromosomal regions ( Scheme of an admixture mapping study. (
) LA estimates in cases and controls individuals from a recently admixed population. (
) Comparison of local ancestry scores of all chromosomal regions between cases and controls. (
) Fine mapping study on genomic regions where genetic ancestry is associated with a trait. Definition of the main concepts.
Ancestry informative marker (AIM) |
Genetic variants, usually SNPs, that show large frequency differences between the parental populations and that are, thus, highly informative for ancestry estimation in admixed populations. |
3.2. Advantages and Disadvantages of Admixture Mapping Studies
3.3. Applications of Admixture Mapping Studies in Biomedical Research
Advantages and disadvantages of using NGS for LA estimation.
LA (Local ancestry), WES (Whole-Exome Sequence), SNP (Single Nucleotide Polymorphism).