Submitted Successfully!
Thank you for your contribution! You can also upload a video entry or images related to this topic.
Ver. Summary Created by Modification Content Size Created at Operation
1 handwiki -- 4898 2022-11-23 01:42:40

Video Upload Options

Do you have a full video?


Are you sure to Delete?
If you have any further questions, please contact Encyclopedia Editorial Office.
Liu, H. Human Genetic Clustering. Encyclopedia. Available online: (accessed on 10 December 2023).
Liu H. Human Genetic Clustering. Encyclopedia. Available at: Accessed December 10, 2023.
Liu, Handwiki. "Human Genetic Clustering" Encyclopedia, (accessed December 10, 2023).
Liu, H.(2022, November 23). Human Genetic Clustering. In Encyclopedia.
Liu, Handwiki. "Human Genetic Clustering." Encyclopedia. Web. 23 November, 2022.
Human Genetic Clustering

Human genetic clustering is the degree to which human genetic variation can be partitioned into a small number of groups or clusters. A leading method of analysis uses mathematical cluster analysis of the degree of similarity of genetic data between individuals and groups in order to infer population structures and assign individuals to hypothesized ancestral groups. A similar analysis can be done using principal component analysis, and several recent studies deploy both methods. Analysis of genetic clustering examines the degree to which regional groups differ genetically, the categorization of individuals into clusters, and what can be learned about human ancestry from this data. There is broad scientific agreement that a relatively small fraction of human genetic variation occurs between populations, continents, or clusters. Researchers of genetic clustering differ, however, on whether genetic variation is principally clinal or whether clusters inferred mathematically are important and scientifically useful.

cluster analysis genetic clustering degree of similarity

1. Analysis of Human Genetic Variation

1.1. Quantifying Variation

One of the underlying questions regarding the distribution of human genetic diversity is related to the degree to which genes are shared between the observed clusters. It has been observed repeatedly that the majority of variation observed in the global human population is found within populations. This variation is usually calculated using Sewall Wright's fixation index (FST), which is an estimate of between to within group variation. The degree of human genetic variation is a little different depending upon the gene type studied, but in general it is common to claim that ~85% of genetic variation is found within groups, ~6–10% between groups within the same continent and ~6–10% is found between continental groups. Ryan Brown and George Armelagos described this as "a host of studies [that have] concluded that racial classification schemes can account for only a negligible proportion of human genetic diversity," including the studies listed in the table below.

Author(s) Year Title Characteristic Studied Proportion of Variation Within Groups

(rather than among populations)

Lewontin 1972 The apportionment of human


17 blood groups 85.4%
Barbujani et al. 1997 An apportionment of human DNA diversity[2] 79 RFLP, 30 microsatellite loci 84.5%
Seielstad, Minch and


1998 Genetic evidence for a higher female migration rate in humans[3] 29 autosomal microsatellite loci 97.8%
10 Y chromosome

microsatellite loci


These average numbers, however, do not mean that every population harbors an equal amount of diversity. In fact, some human populations contain far more genetic diversity than others, which is consistent with the likely African origin of modern humans.[4][5] Therefore, populations outside of Africa may have undergone serial founder effects that limited their genetic diversity.[4][5]

The FST statistic has come under criticism by A. W. F. Edwards[6] and Jeffrey Long and Rick Kittles.[7] British statistician and evolutionary biologist A. W. F. Edwards faulted Lewontin's methodology for basing his conclusions on simple comparison of genes and rather on a more complex structure of gene frequencies. Long and Kittles' objection is also methodological: according to them the FST is based on a faulty underlying assumptions that all populations contain equally genetic diverse members and that continental groups diverged at the same time. Sarich and Miele have also argued that estimates of genetic difference between individuals of different populations understate differences between groups because they fail to take into account human diploidy.[8]

Keith Hunley, Graciela Cabana, and Jeffrey Long created a revised statistical model to account for unequally divergent population lineages and local populations with differing degrees of diversity. Their 2015 paper applies this model to the Human Genome Diversity Project sample of 1,037 individuals in 52 populations.[5] They found that least diverse population examined, the Surui, "harbors nearly 60% of the total species’ diversity." Long and Kittles had noted earlier that the Sokoto people of Africa contains virtually all of human genetic diversity.[9] Their analysis also found that non-African populations are a taxonomic subgroup of African populations, that "some African populations are equally related to other African populations and to non-African populations," and that "outside of Africa, regional groupings of populations are nested inside one another, and many of them are not monophyletic."[5]

1.2. Similarity of Group Members

Multiple studies since 1972[1][2][10][11][12][13] have backed up the claim that, "The average proportion of genetic differences between individuals from different human populations only slightly exceeds that between unrelated individuals from a single population."[14]

Percentage similarity between two individuals from different clusters when 377 microsatellite markers are considered.[15]
x Africans Europeans Asians
Europeans 36.5
Asians 35.5 38.3
Indigenous Americans 26.1 33.4 35

Edwards (2003) claims, "It is not true, as Nature claimed, that 'two random individuals from any one group are almost as different as any two random individuals from the entire world'" and Risch et al. (2002) state "Two Caucasians are more similar to each other genetically than a Caucasian and an Asian." However Bamshad et al. (2004) used the data from Rosenberg et al. (2002) to investigate the extent of genetic differences between individuals within continental groups relative to genetic differences between individuals between continental groups. They found that though these individuals could be classified very accurately to continental clusters, there was a significant degree of genetic overlap on the individual level, to the extent that, using 377 loci, individual Europeans were about 38% of the time more genetically similar to East Asians than to other Europeans.

Witherspoon et al. (2007) have argued that even when individuals can be reliably assigned to specific population groups, it may still be possible for two randomly chosen individuals from different populations/clusters to be more similar to each other than to a randomly chosen member of their own cluster, when sampling a small number of SNPs (as in the case with scientists James Watson, Craig Venter and Seong-Jin Kim). They state that using around one-thousand SNPs, individuals from different populations/clusters are never more similar, which they state some may find surprising. Witherspoon et al. conclude that "caution should be used when using geographic or genetic ancestry to make inferences about individual phenotypes".

2. Blood Polymorphism Study

A 1994 study by Cavalli-Sforza and colleagues evaluated genetic distances among 42 native populations based on 120 blood polymorphisms. The populations were grouped into nine clusters: African (sub-Saharan), Caucasoid (European), Caucasoid (extra-European), northern Mongoloid (excluding Arctic populations), northeast Asian Arctic, southern Mongoloid (mainland and insular Southeast Asia), Pacific islander, New Guinean and Australian, and American (Amerindian). Although the clusters demonstrate varying degrees of homogeneity, the nine-cluster model represents a majority (80 out of 120) of single-trait trees and is useful in demonstrating the phenetic relationship among these populations.[16]

The greatest genetic distance between two continents is between Africa and Oceania, at 0.2470. This measure of genetic distance reflects the isolation of Australia and New Guinea since the end of the Last Glacial Maximum, when Oceania was isolated from mainland Asia due to rising sea levels. The next-largest genetic distance is between Africa and the Americas, at 0.2260. This is expected, since the longest geographic distance by land is between Africa and South America. The shortest genetic distance, 0.0155, is between European and extra-European Caucasoids. Africa is the most genetically divergent continent, with all other groups more related to each other than to sub-Saharan Africans. This is expected, according to the single-origin hypothesis. Europe has a general genetic variation about three times less than that of other continents; the genetic contribution of Asia and Africa to Europe is thought to be two-thirds and one-third, respectively.[16][17]

3. Genetic Cluster Studies

Genetic structure studies are carried out using statistical computer programs designed to find clusters of genetically similar individuals within a sample of individuals. Studies such as those by Risch and Rosenberg use a computer program called STRUCTURE to find human populations (gene clusters). It is a statistical program that works by placing individuals into one of an arbitrary number of clusters based on their overall genetic similarity, many possible pairs of clusters are tested per individual to generate multiple clusters.[18] The basis for these computations are data describing a large number of single nucleotide polymorphisms (SNPs), genetic insertions and deletions (indels), microsatellite markers (or short tandem repeats, STRs) as they appear in each sampled individual. Cluster analysis divides a dataset into any prespecified number of clusters.

These clusters are based on multiple genetic markers that are often shared between different human populations even over large geographic ranges. The notion of a genetic cluster is that people within the cluster share on average similar allele frequencies to each other than to those in other clusters. (A. W. F. Edwards, 2003 but see also infobox "Multi Locus Allele Clusters") In a test of idealised populations, the computer programme STRUCTURE was found to consistently underestimate the numbers of populations in the data set when high migration rates between populations and slow mutation rates (such as single-nucleotide polymorphisms) were considered.[19] In 2004, Lynn Jorde and Steven Wooding argued that "Analysis of many loci now yields reasonably accurate estimates of genetic similarity among individuals, rather than populations. Clustering of individuals is correlated with geographic origin or ancestry."[20]

A number of genetic cluster studies have been conducted since 2002, including the following:

Authors Year Title Sample size / number of populations sampled Sample Markers
Rosenberg et al. 2002 Genetic Structure of Human Populations[21] 1056 / 52 Human Genome Diversity Project (HGDP-CEPH) 377 STRs
Serre & Pääbo 2004 Worldwide Human Relationships Inferred from Genome-Wide Patterns of Variation[22] 89 / 15 a: HGDP 20 STRs
90 / geographically distributed individuals b: Jorde 1997 
Rosenberg et al. 2005 Clines, Clusters, and the Effect of Study Design on the Inference of Human Population Structure[23] 1056 / 52 Human Genome Diversity Project (HGDP-CEPH) 783 STRs + 210 indels
Li et  al. 2008 Worldwide Human Relationships Inferred from Genome-Wide Patterns of Variation[24] 938 / 51 Human Genome Diversity Project (HGDP-CEPH) 650,000 SNPs
Tishkoff et al. 2009 The Genetic Structure and History of Africans and African Americans[25] ~3400 / 185 HGDP-CEPH plus 133 additional African populations and Indian individuals 1327 STRs + indels
Xing et al. 2010 Toward a more uniform sampling of human genetic diversity: A survey of worldwide populations by high-density genotyping[26] 850 / 40 HapMap plus 296 individuals 250,000 SNPs

In a 2005 paper, Rosenberg and his team acknowledged that findings of a study on human population structure are highly influenced by the way the study is designed.[27][28] They reported that the number of loci, the sample size, the geographic dispersion of the samples and assumptions about allele-frequency correlation all have an effect on the outcome of the study.

In a review of studies of human genome diversity, Guido Barbujani and colleagues note that various cluster studies have identified different numbers of clusters with different boundaries. They write that discordant patterns of genetic variation and high within-population genetic diversity "make[] it difficult, or impossible, to define, once and for good, the main genetic clusters of humankind."[4]

Genetic clustering was also criticized by Penn State anthropologists Kenneth Weiss and Brian Lambert. They asserted that understanding human population structure in terms of discrete genetic clusters misrepresents the path that produced diverse human populations that diverged from shared ancestors in Africa. Ironically, by ignoring the way population history actually works as one process from a common origin rather than as a string of creation events, structure analysis that seems to present variation in Darwinian evolutionary terms is fundamentally non-Darwinian."[29]

3.1. Clusters by Rosenberg et al. (2002, 2005)

See also: Race and geneticsA major finding of Rosenberg and colleagues (2002) was that when five clusters were generated by the program (specified as K=5), "clusters corresponded largely to major geographic regions." Specifically, the five clusters corresponded to Africa, Europe plus the Middle East plus Central and South Asia, East Asia, Oceania, and the Americas. The study also confirmed prior analyses by showing that, "Within-population differences among individuals account for 93 to 95% of genetic variation; differences among major groups constitute only 3 to 5%."
Human population structure can be inferred from multilocus DNA sequence data (Rosenberg et al. 2002, 2005). Individuals from 52 populations were examined at 993 DNA markers. This data was used to partition individuals into K = 2, 3, 4, 5, or 6 gene clusters. In this figure, the average fractional membership of individuals from each population is represented by horizontal bars partitioned into K colored segments.

Rosenberg and colleagues (2005) have argued, based on cluster analysis, that populations do not always vary continuously and a population's genetic structure is consistent if enough genetic markers (and subjects) are included. "Examination of the relationship between genetic and geographic distance supports a view in which the clusters arise not as an artifact of the sampling scheme, but from small discontinuous jumps in genetic distance for most population pairs on opposite sides of geographic barriers, in comparison with genetic distance for pairs on the same side. Thus, analysis of the 993-locus dataset corroborates our earlier results: if enough markers are used with a sufficiently large worldwide sample, individuals can be partitioned into genetic clusters that match major geographic subdivisions of the globe, with some individuals from intermediate geographic locations having mixed membership in the clusters that correspond to neighboring regions." They also wrote, regarding a model with five clusters corresponding to Africa, Eurasia (Europe, Middle East, and Central/South Asia), East Asia, Oceania, and the Americas: "For population pairs from the same cluster, as geographic distance increases, genetic distance increases in a linear manner, consistent with a clinal population structure. However, for pairs from different clusters, genetic distance is generally larger than that between intracluster pairs that have the same geographic distance. For example, genetic distances for population pairs with one population in Eurasia and the other in East Asia are greater than those for pairs at equivalent geographic distance within Eurasia or within East Asia. Loosely speaking, it is these small discontinuous jumps in genetic distance—across oceans, the Himalaya s, and the Sahara—that provide the basis for the ability of STRUCTURE to identify clusters that correspond to geographic regions".[30]

Rosenberg stated that their findings "should not be taken as evidence of our support of any particular concept of biological race (...). Genetic differences among human populations derive mainly from gradations in allele frequencies rather than from distinctive 'diagnostic' genotypes."[21] The study's overall results confirmed that genetic difference within populations is between 93 and 95%. Only 5% of genetic variation is found between groups.[27]


The Rosenberg study has been criticised on several grounds.

The existence of allelic clines and the observation that the bulk of human variation is continuously distributed, has led some scientists to conclude that any categorization schema attempting to partition that variation meaningfully will necessarily create artificial truncations. (Kittles & Weiss 2003). It is for this reason, Reanne Frank argues, that attempts to allocate individuals into ancestry groupings based on genetic information have yielded varying results that are highly dependent on methodological design.[31] Serre and Pääbo (2004) make a similar claim:

In a response to Serre and Pääbo (2004), Rosenberg et al. (2005) maintain that their clustering analysis is robust. Additionally, they agree with Serre and Pääbo that membership of multiple clusters can be interpreted as evidence for clinality (isolation by distance), though they also comment that this may also be due to admixture between neighbouring groups (small island model). Thirdly they comment that evidence of clusterdness is not evidence for any specific concepts of "biological race".[23]

Clustering does not particularly correspond to continental divisions. Depending on the parameters given to their analytical program, Rosenberg and Pritchard were able to construct between divisions of between 4 and 20 clusters of the genomes studied, although they excluded analysis with more than 6 clusters from their published article. Probability values for various cluster configurations varied widely, with the single most likely configuration coming with 16 clusters although other 16-cluster configurations had low probabilities. Overall, "there is no clear evidence that K=6 was the best estimate" according to geneticist Deborah Bolnick (2008:76-77).[32] The number of genetic clusters used in the study was arbitrarily chosen. Although the original research used different number of clusters, the published study emphasized six genetic clusters. The number of genetic clusters is determined by the user of the computer software conducting the study. Rosenberg later revealed that his team used pre-conceived numbers of genetic clusters from six to twenty "but did not publish those results because Structure [the computer program used] identified multiple ways to divide the sampled individuals". Dorothy Roberts, a law professor, asserts that "there is nothing in the team's findings that suggests that six clusters represent human population structure better than ten, or fifteen, or twenty."[33] When instructed to find two clusters, the program identified two populations anchored around by Africa and by the Americas. In the case of six clusters, the entirety of Kalesh people, an ethnic group living in Northern Pakistan, was added to the previous five.[27][34]

Commenting on Rosenberg's study, law professor Dorothy Roberts wrote that "the study actually showed that there are many ways to slice the expansive range of human genetic variation.

3.2. Clusters in Tishkoff et al. 2009

Sarah A. Tishkoff and colleagues analyzed a global sample consisting of 952 individuals from the HGDP-CEPH survey, 2432 Africans from 113 ethnic groups, 98 African Americans, 21 Yemenites, 432 individuals of Indian descent, and 10 Native Australians. A global STRUCTURE analysis of these individuals examined 1327 polymorphic markers, including of 848 STRs, 476 indels, and 3 SNPs. The authors reported cluster results for K=2 to K=14. Within Africa, six ancestral clusters were inferred through Bayesian analysis, which were closely linked with ethnolinguistic heritage. Bantu populations grouped with other Niger-Congo-speaking populations from West Africa. African Americans largely belonged to this Niger-Congo cluster, but also had significant European ancestry. Nilo-Saharan populations formed their own cluster. Chadic populations clustered with the Nilo-Saharan groups, suggesting that most present-day Chadic speakers originally spoke languages from the Nilo-Saharan family and later adopted Afro-Asiatic languages. Nilotic populations from the African Great Lakes largely belonged to this Nilo-Saharan cluster too, but also had some Afro-Asiatic influence due to assimilation of Cushitic groups over the last 3,000 years. Khoisan populations formed their own cluster, which grouped closest with the Pygmy cluster. The Cape Coloured showed assignments from the Khoisan, European and other clusters due to the population's mixed heritage. The Hadza and Sandawe populations formed their own cluster. An Afro-Asiatic cluster was also discerned, with the Afro-Asiatic speakers from North Africa and the Horn of Africa forming a contiguous group. Afro-Asiatic speakers in the Great Lakes region largely belonged to this Afro-Asiatic cluster as well, but also had some Bantu and Nilotic influence due to assimilation of adjacent groups over the last 3,000 years. The remaining inferred ancestral clusters were associated with European, Middle Eastern, Oceanian, Indian, Native American and East Asian populations.[35]

3.3. Examining Effects of Sampling in Xing et al. 2010

Jinchuan Xing and colleagues used an alternate dataset of human genotypes including HapMap samples and their own samples (296 new individuals from 13 populations), for a total of 40 populations distributed roughly evenly across the Earth's land surface. They found that the alternate sampling reduced the FST estimate of inter-population differences from 0.18 to 0.11, suggesting that the higher number may be an artifact of uneven sampling. They conducted a cluster analysis using the ADMIXTURE program and found that "genetic diversity is distributed in a more clinal pattern when more geographically intermediate populations are sampled."[26]

3.4. HUGO Asian Study

A study by the HUGO Pan-Asian SNP Consortium in 2009 using the similar principal components analysis found that East Asian and South-East Asian populations clustered together, and suggested a common origin for these populations. At the same time they observed a broad discontinuity between this cluster and South Asia, commenting "most of the Indian populations showed evidence of shared ancestry with European populations". It was noted that "genetic ancestry is strongly correlated with linguistic affiliations as well as geography".[36]

4. Controversy of Genetic Clustering and Associations with "Race"

Studies of clustering reopened a debate on the scientific reality of race, or lack thereof. In the late 1990s Harvard evolutionary geneticist Richard Lewontin stated that "no justification can be offered for continuing the biological concept of race. (...) Genetic data shows that no matter how racial groups are defined, two people from the same racial group are about as different from each other as two people from any two different racial groups.[37] This view has been affirmed by numerous authors[2][10][12] and the American Association of Physical Anthropologists since.[7] A.W.F. Edwards as well as Rick Kittles and Jeffrey Long have criticized Lewontin's methodology, with Long noting that there are more similarities between humans and chimpanzees than differences, and more genetic variation within chimps and humans than between them.[7] Edwards also charged that Lewontin made an "unjustified assault on human classification, which he deplored for social reasons".[38] In their 2015 article, Keith Hunley, Graciela Cabana, and Jeffrey Long recalculate the apportionment of human diversity using a more complex model than Lewontin and his successors. They conclude: "In sum, we concur with Lewontin’s conclusion that Western-based racial classifications have no taxonomic significance, and we hope that this research, which takes into account our current understanding of the structure of human diversity, places his seminal finding on firmer evolutionary footing."[5]

Genetic clustering studies, and particularly the five-cluster result published by Rosenberg's team in 2002, have been interpreted by journalist Nicholas Wade, evolutionary biologist Armand Marie Leroi, and others as demonstrating the biological reality of race.[39][40][41] For Leroi, "Race is merely a shorthand that enables us to speak sensibly, though with no great precision, about genetic rather than cultural or political differences." He states that, "One could sort the world's population into 10, 100, perhaps 1,000 groups", and describes Europeans, Basques, Andaman Islanders, Ibos, and Castilians each as a "race".[41] In response to Leroi's claims, the Social Science Research Council convened a panel of experts to discuss race and genomics online.[42] In their 2002 and 2005 papers, Rosenberg and colleagues disagree that their data implies the biological reality of race.[21][23]

In 2006, Lewontin wrote that any genetic study requires some a priori concept of race or ethnicity in order to package human genetic diversity into a defined, limited number of biological groupings. Informed by genetics, zoologists have long discarded the concept of race for dividing groups of non-human animal populations within a species. Defined on varying criteria, in the same species a widely varying number of races could be distinguished. Lewontin notes that genetic testing revealed that "because so many of these races turned out to be based on only one or two genes, two animals born in the same litter could belong to different 'races'".[43]

Studies that seek to find genetic clusters are only as informative as the populations they sample. For example, Risch and Burchard relied on two or three local populations from five continents, which together were supposed to represent the entire human race.[27] Another genetic clustering study used three sub-Saharan population groups to represent Africa; Chinese, Japanese, and Cambodian samples for East Asia; Northern European and Northern Italian samples to represent "Caucasians". Entire regions, subcontinents, and landmasses are left out of many studies. Furthermore, social geographical categories such "East Asia" and "Caucasians" were not defined. "A handful of ethnic groups to symbolize an entire continent mimic a basic tenet of racial thinking: that because races are composed of uniform individuals, anyone can represent the whole group" notes Roberts.[27][44][45]

The model of Big Few fails when including overlooked geographical regions such as India. The 2003 study which examined fifty-eight genetic markers found that Indian populations owe their ancestral lineages to Africa, Central Asia, Europe, and southern China.[46][47] Reardon, from Princeton University, asserts that flawed sampling methods are built into many genetic research projects. The Human Genome Diversity Project (HGDP) relied on samples which were assumed to be geographically separate and isolated.[48] The relatively small sample sizes of indigenous populations for the HGDP do not represent the human species' genetic diversity, nor do they portray migrations and mixing population groups which has been happening since prehistoric times. Geographic areas such as the Balkans, the Middle East, North and East Africa, and Spain are seldom included in genetic studies.[27][49] East and North African indigenous populations, for example, are never selected to represent Africa because they do not fit the profile of "black" Africa. The sampled indigenous populations of the HGDP are assumed to be "pure"; the law professor Roberts claims that "their unusual purity is all the more reason they cannot stand in for all the other populations of the world that marked by intermixture from migration, commerce, and conquest."[27]

King and Motulsky, in a 2002 Science article, state that "While the computer-generated findings from all of these studies offer greater insight into the genetic unity and diversity of the human species, as well as its ancient migratory history, none support dividing the species into discrete, genetically determined racial categories".[50] Cavalli-Sforza asserts that classifying clusters as races would be a "futile exercise" because "every level of clustering would determine a different population and there is no biological reason to prefer a particular one". Bamshad, in 2004 paper published in Nature, asserts that a more accurate study of human genetic variation would use an objective sampling method, which would choose populations randomly and systematically across the world, including those populations which are characterized by historical intermingling, instead of cherry-picking population samples which fit a priori concepts of racial classification. Roberts states that "if research collected DNA samples continuously from region to region throughout the world, they would find it impossible to infer neat boundaries between large geographical groups."[27][51][52][53]

Anthropologists such as C. Loring Brace,[54] philosophers Jonathan Kaplan and Rasmus Winther,[55][55][56][57] and geneticist Joseph Graves,[58] have argued that while it is certainly possible to find biological and genetic variation that corresponds roughly to the groupings normally defined as "continental races", this is true for almost all geographically distinct populations. The cluster structure of the genetic data is therefore dependent on the initial hypotheses of the researcher and the populations sampled. When one samples continental groups the clusters become continental; if one had chosen other sampling patterns the clustering would be different. Weiss and Fullerton have noted that if one sampled only Icelanders, Mayans and Maoris, three distinct clusters would form and all other populations could be described as being clinally composed of admixtures of Maori, Icelandic and Mayan genetic materials.[59] Kaplan and Winther therefore argue that seen in this way both Lewontin and Edwards are right in their arguments. They conclude that while racial groups are characterized by different allele frequencies, this does not mean that racial classification is a natural taxonomy of the human species, because multiple other genetic patterns can be found in human populations that cross-cut racial distinctions. Moreover, the genomic data under-determines whether one wishes to see subdivisions (i.e., splitters) or a continuum (i.e., lumpers). Under Kaplan and Winther's view, racial groupings are objective social constructions (see Mills 1998 [60]) that have conventional biological reality only insofar as the categories are chosen and constructed for pragmatic scientific reasons.

4.1. Commercial Ancestry Testing and Individual Ancestry

Commercial ancestry testing companies, who use genetic clustering data, have been also heavily criticized. Limitations of genetic clustering are intensified when inferred population structure is applied to individual ancestry. The type of statistical analysis conducted by scientists translates poorly into individual ancestry because they are looking at difference in frequencies, not absolute differences between groups. Commercial genetic genealogy companies are guilty of what Pillar Ossorio calls the "tendency to transform statistical claims into categorical ones".[61] Not just individuals of the same local ethnic group, but two siblings may end up beings as members of different continental groups or "races" depending on the alleles they inherit.[27]

Many commercial companies use data from the International HapMap Project (HapMap)'s initial phrase, where population samples were collected from four ethnic groups in the world: Han Chinese, Japanese, Yoruba Nigerian, and Utah residents of Northern European ancestry. If a person has ancestry from a region where the computer program does not have samples, it will compensate with the closest sample that may have nothing to do with the customer's actual ancestry: "Consider a genetic ancestry testing performed on an individual we will call Joe, whose eight great-grandparents were from southern Europe. The HapMap populations are used as references for testing Joe's genetic ancestry. The HapMap's European samples consist of "northern" Europeans. In regions of Joe's genome that vary between northern and southern Europe (such regions might include the lactase gene), the genetic ancestry test is using the HapMap reference population is likely to incorrectly assign the ancestry of that portion of the genome to a non-European population because that genomic region will appear to be more similar to the HapMap's Yoruba or Han Chinese samples than to Northern European samples.[62] Likewise, a person having Western European and Western African ancestries may have ancestors from Western Europe and West Africa, or instead be assigned to East Africa where various ancestries can be found.[63] "Telling customers that they are a composite of several anthropological groupings reinforces three central myths about race: that there are pure races, that each race contains people who are fundamentally the same and fundamentally different from people in other races, and that races can be biologically demarcated." Many companies base their findings on inadequate and unscientific sampling methods. Researchers have never sampled the world's populations in a systematic and random fashion.[27]

4.2. Geographical and Continental Groupings

Roberts argues against the use of broad geographical or continental groupings: "molecular geneticists routinely refer to African ancestry as if everyone on the continent is more similar to each other than they are to people of other continents, who may be closer both geographically and genetically.[27] Ethiopians have closer genetic affinity with Armenians than with Bantu populations.[64] Similarly, Somalis are genetically more similar to Gulf Arab populations than to other populations in Africa.[65] Braun and Hammonds (2008) asserts that the misperception of continents as natural population groupings is rooted in the assumption that populations are natural, isolated, and static. Populations came to be seen as "bounded units amenable to scientific sampling, analysis, and classification".[66] Human beings are not naturally organized into definable, genetically cohesive populations.

5. Software

Software which support genetic clustering calculation.

  • Frappe[69]
  • sNMF [70]


  1. Lewontin, R. C. (1972). The Apportionment of Human Diversity. 6. Theodosius Dobzhansky, Max K. Hecht, William C. Steere (eds.). 381–398. doi:10.1007/978-1-4684-9063-3_14. ISBN 978-1-4684-9065-7.
  2. Barbujani, Guido; Magagni, Arianna; Minch, Eric; Cavalli-Sforza, L. Luca (1997-04-29). "An apportionment of human DNA diversity". Proceedings of the National Academy of Sciences 94 (9): 4516–4519. doi:10.1073/pnas.94.9.4516. ISSN 0027-8424. PMID 9114021. Bibcode: 1997PNAS...94.4516B.
  3. Seielstad, Mark T.; Minch, Eric; Cavalli-Sforza, L. Luca (1998). "Genetic evidence for a higher female migration rate in humans". Nature Genetics 20 (3): 278–280. doi:10.1038/3088. ISSN 1061-4036. PMID 9806547.
  4. Barbujani, G.; Ghirotto, S.; Tassi, F. (2013-09-01). "Nine things to remember about human genome diversity". Tissue Antigens 82 (3): 155–164. doi:10.1111/tan.12165. ISSN 1399-0039. PMID 24032721.
  5. Hunley, Keith L.; Cabana, Graciela S.; Long, Jeffrey C. (2015-12-01). "The apportionment of human diversity revisited". American Journal of Physical Anthropology 160 (4): 561–569. doi:10.1002/ajpa.22899. ISSN 1096-8644. PMID 26619959.
  6. Edwards, A.W.F. (2003-08-01). "Human genetic diversity: Lewontin's fallacy". BioEssays 25 (8): 798–801. doi:10.1002/bies.10315. ISSN 1521-1878. PMID 12879450.
  7. Long, Jeffrey C.; Kittles, Rick A. (2009). "Human Genetic Diversity and the Nonexistence of Biological Races". Human Biology 81 (5): 777–798. doi:10.3378/027.081.0621. ISSN 1534-6617. PMID 20504196. Retrieved 2016-01-13. 
  8. Sarich VM, Miele F. Race: The Reality of Human Differences. Westview Press (2004). ISBN:0-8133-4086-1
  9. Long, Jeffrey C.; Kittles, Rick A. (2009). "Human Genetic Diversity and the Nonexistence of Biological Races". Human Biology 81 (5): 793–794. doi:10.3378/027.081.0621. ISSN 1534-6617. PMID 20504196. Retrieved 2016-01-13. 
  10. Latter, B. D. H. (1980). "Genetic Differences Within and Between Populations of the Major Human Subgroups". The American Naturalist 116 (2): 220–237. doi:10.1086/283624. ISSN 0003-0147.
  11. Jorde, L. B.; Watkins, W. S.; Bamshad, M. J.; Dixon, M. E.; Ricker, C. E.; Seielstad, M. T.; Batzer, M. A. (2000). "The Distribution of Human Genetic Diversity: A Comparison of Mitochondrial, Autosomal, and Y-Chromosome Data". The American Journal of Human Genetics 66 (3): 979–988. doi:10.1086/302825. ISSN 0002-9297. PMID 10712212.
  12. Brown, Ryan A.; Armelagos, George J. (2001). "Apportionment of racial diversity: a review". Evolutionary Anthropology 10 (1): 34–40. doi:10.1002/1520-6505(2001)10:1<34::aid-evan1011>;2-p.
  13. Romualdi, Chiara; Balding, David; Nasidze, Ivane S.; Risch, Gregory; Robichaux, Myles; Sherry, Stephen T.; Stoneking, Mark; Batzer, Mark A. et al. (2002). "Patterns of human diversity, within and among continents, inferred from biallelic DNA polymorphisms". Genome Research 12 (4): 602–612. doi:10.1101/gr.214902. PMID 11932244.
  14. Quote from Rosenberg, Noah A.; Pritchard, Jonathan K.; Weber, James L.; Cann, Howard M.; Kidd, Kenneth K.; Zhivotovsky, Lev A.; Feldman, Marcus W. (2002-12-20). "Genetic Structure of Human Populations". Science 298 (5602): 2381–2385. doi:10.1126/science.1078311. ISSN 0036-8075. PMID 12493913. Bibcode: 2002Sci...298.2381R.
  15. The table gives the percentage likelihood that two individuals from different clusters are genetically more similar to each other than to someone from their own population when 377 microsatellite markers are considered from Michael Bamshad (2004). "Deconstructing the Relationship Between Genetics and Race". Nature Reviews Genetics 5 (598): 598–609. doi:10.1038/nrg1401. PMID 15266342. , original data from Rosenberg (2002).
  16. Cavalli-Sforza, Luigi Luca; Menozzi, Paolo; Piazza, Alberto (1994). The History and Geography of Human Genes. Princeton: Princeton University Press. ISBN 978-0-691-08750-4. 
  17. Genes, peoples, and languages, Luigi Luca Cavalli-Sforza, Proceedings of the National Academy of Sciences, 1997, vol.94, pp.7719–7724, doi=10.1073/pnas.94.15.7719
  18. Witherspoon, D.J.; Wooding, S.; Rogers, A.R.; Marchani, E.E.; Watkins, W.S.; Batzer, M.A.; Jorde, L.B. (2007). "Genetic Similarities Within and Between Human Populations". Genetics 176 (1): 351–359. doi:10.1534/genetics.106.067355. PMID 17339205.
  19. Wapples, R.; Gaggiotti, O. (2006). "What is a population? An empirical evaluation of some genetic methods for identifying the number of gene pools and their degree of connectivity". Molecular Ecology 15 (6): 1419–1439. doi:10.1111/j.1365-294X.2006.02890.x. PMID 16629801. 
  20. Lynn B Jorde & Stephen P Wooding, 2004, "Genetic variation, classification and 'race'" in Nature Genetics 36, S28–S33 Genetic variation, classification and 'race'
  21. Rosenberg, Noah A.; Pritchard, Jonathan K.; Weber, James L.; Cann, Howard M.; Kidd, Kenneth K.; Zhivotovsky, Lev A.; Feldman, Marcus W. (2002-12-20). "Genetic Structure of Human Populations". Science 298 (5602): 2381–2385. doi:10.1126/science.1078311. ISSN 0036-8075. PMID 12493913. Bibcode: 2002Sci...298.2381R.
  22. Serre, David; Pääbo, Svante (September 2004). "Evidence for gradients of human genetic diversity within and among continents". Genome Research 14 (9): 1679–1685. doi:10.1101/gr.2529604. ISSN 1088-9051. PMID 15342553.
  23. Rosenberg, NAExpression error: Unrecognized word "etal". (2005). "Clines, Clusters, and the Effect of Study Design on the Inference of Human Population Structure". PLoS Genet 1 (6): e70. doi:10.1371/journal.pgen.0010070. PMID 16355252.
  24. Li, Jun Z.; Absher, Devin M.; Tang, Hua; Southwick, Audrey M.; Casto, Amanda M.; Ramachandran, Sohini; Cann, Howard M.; Barsh, Gregory S. et al. (2008-02-22). "Worldwide Human Relationships Inferred from Genome-Wide Patterns of Variation". Science 319 (5866): 1100–1104. doi:10.1126/science.1153717. ISSN 0036-8075. PMID 18292342. Bibcode: 2008Sci...319.1100L.
  25. Tishkoff, Sarah A; Reed, Floyd A; Friedlaender, Françoise R; Ehret, Christopher; Ranciaro, Alessia; Froment, Alain; Hirbo, Jibril B; Awomoyi, Agnes A et al. (2009-05-22). "The Genetic Structure and History of Africans and African Americans". Science 324 (5930): 1035–1044. doi:10.1126/science.1172257. ISSN 0036-8075. PMID 19407144. Bibcode: 2009Sci...324.1035T.
  26. Xing, Jinchuan; Watkins, W. Scott; Shlien, Adam; Walker, Erin; Huff, Chad D.; Witherspoon, David J.; Zhang, Yuhua; Simonson, Tatum S. et al. (October 2010). "Toward a more uniform sampling of human genetic diversity: A survey of worldwide populations by high-density genotyping". Genomics 96 (4): 199–210. doi:10.1016/j.ygeno.2010.07.004. ISSN 0888-7543. PMID 20643205.
  27. Roberts, Dorothy (2011). Fatal Invention. London, New York: The New Press. 
  28. Noah A. Rosenberg; Saurabh Mahajan; Sohini Ramachandran; Chengfeng Zhao; Jonathan K. Pritchard; Marcus Feldman (2005). "Clines, Clusters, and the Effects of Study Design on the Inference of Human Population Science". PLOS Genetics 1 (6): 660, 668. doi:10.1371/journal.pgen.0010070. PMID 16355252.
  29. Kenneth M. Weiss; Brian W. Lambert (2010). "Does History Matter? Do the Facts of Human Variation Package Our Views or Do Our Views Package the Facts?". Evolutionary Anthropology 19 (3): 92–97. doi:10.1002/evan.20261.
  30. Rosenberg NA, Mahajan S, Ramachandran S, Zhao C, Pritchard JK, Feldman MW (December 2005). "Clines, Clusters, and the Effect of Study Design on the Inference of Human Population Structure". PLoS Genetics 1 (6): e70. doi:10.1371/journal.pgen.0010070. PMID 16355252.
  31. "Back with a Vengeance: the Reemergence of a Biological Conceptualization of Race in Research on Race/Ethnic Disparities in Health Reanne Frank". 
  32. Bolnick, Deborah A. (2008). "Individual Ancestry Inference and the Reification of Race as a Biological Phenomenon". in Koenig, Barbara A.; Richardson, Sarah S.; Lee, Sandra Soo-Jin. Revisiting race in a genomic age. Rutgers University Press. ISBN 978-0-8135-4324-6. 
  33. Kalinowski. The Computer Program STRUCTURE Does Not Reliably Identify Main Genetic Clusters Within Species. 4. pp. 67–77. 
  34. Sadaf Firasat, Shagufta Khalig, Aisha Mohyuddin, Myrto papioannou, Chris Tyler-Smith, Peter A. Underhill, and Qasim Ayub (2007). "Y-Chromosomal Evidence for a Limited Greek Contribution to the Pathan Population of Pakistan". European Journal of Human Genetics 15 (1): 121–6. doi:10.1038/sj.ejhg.5201726. PMID 17047675.
  35. Supporting Online Material for Tishkoff, Sarah A; Reed, Floyd A; Friedlaender, Françoise R; Ehret, Christopher; Ranciaro, Alessia; Froment, Alain; Hirbo, Jibril B; Awomoyi, Agnes A et al. (2009-05-22). "The Genetic Structure and History of Africans and African Americans". Science 324 (5930): 1035–1044. doi:10.1126/science.1172257. ISSN 0036-8075. PMID 19407144. Bibcode: 2009Sci...324.1035T.
  36. Mapping Human Genetic Diversity in Asia , The HUGO Pan-Asian SNP Consortium, 2009
  37. "Response to OMB Directive 15". American Anthropological Association. 1997. 
  38. A.W.F. Edwards (2003). "Human Genetic Diversity: Lewontin's Fallacy". BioEssays 25 (8): 798–801. doi:10.1002/bies.10315. PMID 12879450.
  39. Wade, Nicholas (2015-04-28). A Troublesome Inheritance: Genes, Race and Human History. Penguin. ISBN 978-0-14-312716-1. 
  40. Raff, Jennifer (1 July 2014). "Nicholas Wade and Race: Building a Scientific Façade". Human Biology 86 (3): 227–232. doi:10.13110/humanbiology.86.3.0227. ISSN 0018-7143. 
  41. Leroi, Armand Marie (14 March 2005). "A Family Tree in Every Gene". New York Times. 
  42. Social Science Research Council. "Is Race "Real"?". 
  43. "Confusion About Human Races". Social Science Research Council. 26 July 2006. 
  44. Charles N. Rotini; Lynn B. Jorde (2010). "Ancestry and Disease in the Age of Genomic Medicine". New England Journal of Medicine 363 (16): 1551–1552. doi:10.1056/nejmra0911564. PMID 20942671.
  45. S.O.Y. Keita; Rick A. Kittles (1997). "The Persistence of Racial Thinking and the Myth of Racial Divergence". American Anthropologist 99 (3): 534–544. doi:10.1525/aa.1997.99.3.534.
  46. Rick A. Kittles; Kenneth M. Wells (2003). "Race, Ancestry, and Genes: Implications for Defining Disease Risk". Annual Review of Genomics and Human Genetics 4: 33–67. doi:10.1146/annurev.genom.4.070802.110356. PMID 14527296.
  47. Analabha Basul (2003). "Ethnic India: A Genomic View with Special Reference to Peopling and Structure". Genome Research 13 (10): 2277–90. doi:10.1101/gr.1413403. PMID 14525929.
  48. Reardon, Jenny (2005). Race to the Finish: Identity and Governance in the Age of Genomics. Princeton, NJ: Princeton University Press. 
  49. Graves, Joseph (2004). The Race Myth. New York: Dutton. p. 113. 
  50. Mary-Claire King; Arno G. Motulsky (2002). "Mapping Human History". Science 298 (5602): 2342–2343. doi:10.1126/science.1080373. PMID 12493903.
  51. Michael Bamshad (2004). "Deconstructing the Relationship Between Genetics and Race". Nature Reviews Genetics 5 (598): 598–609. doi:10.1038/nrg1401. PMID 15266342.
  52. John H. Fujimura; Ramya Rajagopalan; Pilar N. Ossorio; Kjell A. Doksum (2010). "Race and Ancestry: Operationalizing Populations in Human Genetic Variation Studies". What's the Use of Race? Modern Governance and the Biology of Difference. 
  53. L. Luca Cavalli-Sforza; Paolo Menozzi; Alberto Piazza (1994). The History and Geography of Human Genes. Princeton, NJ: Princeton University Press. 
  54. Loring Brace, C. 2005. Race is a four letter word. Oxford University Press.
  55. Kaplan, Jonathan Michael (January 2011) 'Race': What Biology Can Tell Us about a Social Construct. In: Encyclopedia of Life Sciences (ELS). John Wiley & Sons, Ltd: Chichester
  56. Winther, Rasmus Grønfeldt (2011) ¿La cosificación genética de la 'raza'? Un análisis crítico in C López-Beltrán (ed.) Genes (&) Mestizos. Genómica y raza en la biomedicina mexicana. Ficticia editorial
  57. Kaplan, Jonathan Michael, Winther, Rasmus Grønfeldt (2012). Prisoners of Abstraction? The Theory and Measure of Genetic Variation, and the Very Concept of 'Race' Biological Theory 7
  58. Graves, Joseph. 2001. The Emperor's New Clothes. Rutgers University Press
  59. Weiss, KM; Fullerton, SM (2005). "Racing around, getting nowhere". Evolutionary Anthropology 14 (5): 165–169. doi:10.1002/evan.20079.
  60. Mills CW (1988) "But What Are You Really? The Metaphysics of Race" in Blackness visible: essays on philosophy and race, pp. 41–66. Cornell University Press, Ithaca, NY
  61. Pillar Ossorio (2005). "Race, Genetic Variation, and the Haplotype Mapping Project". Louisiana Law Review 66 (131, 141). 
  62. Royal, Novembre, Fullerton. Inferring Genetic Ancestry. 
  63. Mark D., Shriver; Rick A. Kittles (2004). "Genetic Ancestry and the Search for Personalized Genetic Histories". Nature Reviews Genetics 5 (8): 611–8. doi:10.1038/nrg1405. PMID 15266343.
  64. Wilson, James F.; Weale, Michael E.; Smith, Alice C.; Gratrix, Fiona; Fletcher, Benjamin; Thomas, Mark G.; Bradman, Neil; Goldstein, David B. (2001). "Population genetic structure of variable drug response". Nature Genetics 29 (3): 265–69. doi:10.1038/ng761. PMID 11685208.
  65. Mohamoud, A. M. (October 2006). "P52 Characteristics of HLA Class I and Class II Antigens of the Somali Population". Transfusion Medicine 16 (Supplement s1): 47. doi:10.1111/j.1365-3148.2006.00694_52.x.
  66. Braun, Lundy; Evelynn Hammonds (2008). "Race, Populations, and Genomics: Africa as Laboratory". Social Science & Medicine 67 (10): 1580–8. doi:10.1016/j.socscimed.2008.07.018. PMID 18755531.
  67. "Inference of population structure using multilocus genotype data". Genetics 155 (2): 945–59. 2000. PMID 10835412.
  68. Alexander, D. H.; Novembre, J.; Lange, K. (2009). "Fast model-based estimation of ancestry in unrelated individuals". Genome Research 19 (9): 1655–1664. doi:10.1101/gr.094052.109. ISSN 1088-9051. PMID 19648217.
  69. "Estimation of individual admixture: analytical and study design considerations". Genet. Epidemiol. 28 (4): 289–301. 2005. doi:10.1002/gepi.20064. PMID 15712363.
  70. Frichot, E. and Mathieu, F. and Trouillon, T. and Bouchard, G. and Francois, O. (2014). "Fast and efficient estimation of individual ancestry coefficients". Genetics 196 (4): 973–983. doi:10.1534/genetics.113.160572. PMID 24496008.
Contributor MDPI registered users' name will be linked to their SciProfiles pages. To register with us, please refer to :
View Times: 1333
Entry Collection: HandWiki
Revision: 1 time (View History)
Update Date: 23 Nov 2022