Mitochondrial DNA (MtDNA) is maternally inherited so that any maternally related individual could have the same mtDNA sequence. In contrast to the more traditional nuclear DNA markers typically used, mtDNA provides a valuable locus for forensic DNA typing in some cases. With technological advances, researchers in molecular laboratories have developed new assays. Long-term dead or missing individuals or any living maternal relatives could provide a reference sample that is extremely useful in determining the identity of an individual. Furthermore, a large number of nucleotide polymorphisms or sequence variants in two highly variable portions of the non-coding control region could be effective in distinguishing between individuals and/or biological samples. Muscle, bone, hair, skin, blood, and other body fluids may provide sufficient material to type the mtDNA locus despite degradation due to environmental stress or time. MtDNA is inherited only from the mother: thus, any maternally related individual could provide a reference sample in cases where direct comparison with biological samples is not possible. However, the maternal inheritance pattern of mtDNA may also be considered problematic. As all individuals in a maternal lineage have the same mtDNA sequence, mtDNA could not be considered a unique identifier. Indeed, apparently unrelated individuals may have shared an unknown maternal relative at some distant point in the past.
The donkey is the only individually domesticated hoofed animal in Africa. Its origin was assessed by sampling donkeys from 52 countries in the Eastern hemisphere and sequencing 479 base pairs (bp) of the mtDNA control region [48]. Two highly divergent phylogenetic groups were systematically analyzed and identified: Asian wild half-asses (E. hemones and E. kiang) and two extant wild African ass subspecies, the Somali wild ass (E. africanus somaliensis) and the Nubian wild ass (E. africanus africanus). The findings ruled out the possibility of Asian wild half-asses as an ancestor of the Chinese donkey. The African wild ass is anticipated to be an ancestor of the Chinese donkey. Research showed that the practice of animal domestication first emerged in the Near East but re-emerged in Northeastern Africa, a region that may have been particularly instrumental in the expansion of population and trade in the Old World. The domestication of donkeys may have originated as a response to Saharan desertification (5000–7000 years ago) by pastoralists and other social elements in Northeastern Africa, and it supplies clues for archaeological studies looking for evidence of the original domestication of the donkey.
In 2017, Stanisic et al. assessed the current genetic status of the three largest populations of E. asinus in the central Balkans (Serbia) by analyzing the variability of the nuclear microsatellite and mitochondrial (mtDNA) control regions of 77 and 49 individuals, respectively [49]. A comparative analysis of mtDNA datasets and mtDNA sequences of 209 published ancient and modern individuals from 19 European and African populations provided new insights into the origin and history of Balkan donkeys. The Balkan donkeys (Equus asinus L.) in Serbia are diverse, with populations that are highly genetically diverse at the nuclear and mtDNA levels, but with severely declining populations. The two groups of individuals were found to have similar phenotypic characteristics, different nuclear backgrounds and different proportions of mtDNA haplotypes belonging to maternal Clades 1 and 2. Clade 2 may have appeared in Greece before Clade 1, while the expansion and diversification of Clade 1 in the Balkans preceded that of Clade 2.
MtDNA D-loop and cytochrome b gene fragments were amplified and sequenced from 21 suspected donkey remains from four archaeological sites in China to explore the matrilineal origin and transmission routes of the Chinese donkey [18]. Phylogenetic analysis revealed that ancient Chinese donkeys had a high mitochondrial DNA diversity and two distinct mitochondrial maternal lineages, Somali and Nubian. The results implied that the maternal origins of Chinese domestic donkeys may be related to African wild asses (which include Nubian and Somali wild asses), and along with historical records, showed that these origins were introduced to Western and Northern China before the appearance of the Han Dynasty (2202 years ago). During the Tang Dynasty (618–907), when the Silk Road reached its golden age, domestic donkey populations in China increased, primarily to meet the demands of expanding trade. These donkeys were likely used as commodities or for the transportation of goods along the Silk Road. The research provides valuable ancient animal DNA evidence for early trade between African and Asian populations for the first time. DNA analysis of ancient Chinese donkeys revealed the dynamics of matrilineal origins, domestication, and transmission pathways.
A total of 367 mtDNA D-loop sequences of Chinese donkeys were analyzed by scholars, and 96 haplotypes and 57 polymorphic loci were found to have a rich genetic diversity by analyzing the experimental results [50]. Considering that the mtDNA genetic diversity is distributed among all donkey breeds and sizes, no obvious relationship exists between the matrilineal inheritance of donkeys and their geographical distribution or body size. Moreover, the sequences of Chinese and Asian wild asses are not clustered together, thus ruling out the possibility that the Asian wild asses are the maternal ancestor of the Chinese donkey. As for the similarity in fur color and morphology between some Chinese domestic donkeys and Asian wild asses, it may be caused by the convergent evolution of these two species living in a similar ecological environment. As the fur color of donkeys is determined by nuclear genes, whether homology exists between the fur color genes of domestic donkeys and wild asses remains to be investigated.
By using mtDNA Cyt b gene sequences and Y chromosome microsatellite methods, the genetic diversity and origin of 273 male donkeys from 13 domestic donkey breeds in China were studied [51]. The results showed that the Chinese donkey has a rich genetic diversity in the Cyt b gene. No polymorphism was found at any of the five Y-chromosome-specific microsatellite loci, showing that the Y-chromosome genetic diversity of the Chinese donkey is extremely low and only one paternal origin exists. Two of the Y-chromosome microsatellites could be used as microsatellite markers to distinguish the Chinese domestic donkey from the European domestic donkey, indicating that these donkeys may have different paternal origins. The low mutation rate of the Y chromosome in the Chinese donkey contrasts with the high mutation rate of the mitochondrial D-loop region and the rich genetic diversity of the autosomal microsatellites, suggesting a greater role for matrilineal inheritance in genetic diversity in the Chinese donkey. A strong gender bias may also be present in early breeding, where one male donkey breeds with multiple females.
The genetic diversity, origin, and domestication of donkeys have been extensively studied using autosomal microsatellites and mitochondrial genomes. However, the male-specific regions of the Y chromosome of modern donkeys are largely uncharacterized. Fourteen published Y-chromosome-specific microsatellite (Y-STR) studies were performed on 395 male donkey samples from China, Egypt, Spain, and Peru by fluorescently labeling microsatellite markers [52]. The results indicated seven male-specific polymorphisms and showed 2–8 alleles with polymorphism in donkeys. A total of 21 haplotypes were identified, possibly reflecting weak sex bias in breeding programs, with a large amount of paternal inheritance contributing to the high male effective population size of native Chinese donkeys. Three haplotype groups were also identified, indicating three separate paternal lines in domestic donkeys. The five Y-STR markers in donkeys showed polymorphism. The Y-STR of donkeys is richer in genetic diversity than that of horses, and the relatively high level of Y-chromosome variability is consistent with the extensive mtDNA diversity of domestic donkeys. The abundance of polymorphisms is a common feature of donkey nuclear DNA, particularly the abundance of microsatellite polymorphisms on autosomes. These markers could be used in studies of donkey Y chromosome diversity and population genetics in African, European, South American, and Chinese donkeys.
WGS is a comprehensive method for analyzing entire genomes. It could detect single nucleotide variants, insertions/deletions, copy number changes, and large structural variants. Due to the recent technological innovations, the latest genome sequencers can perform WGS more efficiently than ever. Unlike focused approaches, such as exome sequencing or targeted resequencing, which both analyze a limited portion of the genome, WGS delivers a high-resolution, base-by-base view of the genome. Firstly, it could capture large and small variants that may be missed with targeted approaches. Then, it could identify potential causative variants for further follow-up studies of gene expression and regulation mechanisms. Lastly, it could deliver large volumes of data in a short amount of time to support the assembly of novel genomes. Although WGS has many advantages, it also has several disadvantages that should be mentioned. First, the role of most genes in the genome is still unknown or incompletely understood, indicating that some “information” found in the genome sequence is unusable at present. Second, an individual’s genome may contain information that the individual does not want to be known. Third, the volume of information contained in a genome sequence is vast, and analyzing these data is difficult. The scalable, flexible nature of next-generation sequencing technology makes it equally useful for sequencing any species, such as livestock, plants, or disease-related microbes. The rapidly dropping sequencing costs and ability to produce large volumes of data with today’s sequencers make WGS a powerful tool for genomics research.
Compared with other mammals, the species of genus Equus have a more pronounced karyotypic diversity, and they are high-quality models for exploring karyotypic instability. The high frequency of mitogenic repositioning events in this genus is a puzzling phenomenon, and analysis of whole-genome sequences is a sophisticated and powerful method to study chromosome evolution. The findings of whole-genome assembly from donkeys and Asian wild asses could reflect the donkey’s unique characteristics, that is, more efficient energy metabolism and better immunity than horses [53]. Researchers detected abundant satellite sequences in some inactive meristematic regions but not in neo-meristematic regions. On the contrary, ribosomal RNA was frequently present in neo-meristematic regions rather than obsolete meristematic regions. The donkey and Asian wild-ass genomes complement the reference genomes of the genus Equus, and comparative analyses based on these genome sequences provided important insights into the demographic history and adaptive evolution of the donkey. Furthermore, these results enhanced the understanding of chromosomal rearrangements and characteristic sequence dynamics associated with filament repositioning; the data could contribute to future studies of genomics and mammalian chromosome evolution in the genus Equus.
Long-range and highly accurate de-novo assembly from short-read data is one of the most pressing challenges in genomics. Read pairs generated by proximity ligation of DNA in living tissue chromatin could solve this problem, thus extensively improving the scaffolding continuity of the assembly. In 2016, Putnam et al. reported an in-vitro method for generating long-distance ligation data that improves the scaffolding of de-novo assembled genomes [54]. Their methodology, called Chicago, requires only a small amount of high molecular weight DNA as starting material and uses recombinant chromatin as a substrate to generate a proximity ligation library. By using HiRise software, genomic scaffolds could be generated. The authors demonstrated the utility of their methodology for human genome assembly and scaffolding. Moreover, the Chicago library could be used to improve existing assemblies, as illustrated by reassembling and building genomic scaffolds of American alligators and for haplotype staging. The main weakness linked to HiRise is the introduction of assembly errors (mostly short indels) that must be corrected with accurate short reads. The sequencing errors could hinder the check accuracy gene annotations (BUSCO) and protein prediction.
Donkeys and horses share a common ancestor of the genus that lived 4.0–4.5 million years ago. While high-quality genome assembly for horses at the chromosomal level is available, the current genome assembly for donkeys is limited to moderate scaffold size. The novel high-quality donkey genome assembly obtained using HiRise assembly technology, which provides scaffolds of higher quality and length (sub-chromosomal size) than the existing donkey genome assemblies, further expanded the studies of selective breeding, equine evolution (including species formation and domestication), and conservation. The new assembly could identify runs of homozygosity (ROHs) caused by low effective population size and ancestral affinities, thus exploring their impact on the existing distance patterns between donkeys and horses [55]. This new assembly was used to obtain more precise measures of heterozygosity (genome wide and local) than horses and detect donkey purity that may be associated with positive selection. It was able to identify fine chromosomal rearrangements between horses and donkeys which may have played an active role in their differentiation and ultimately, speciation.
Recently, researchers assembled a new draft genome of the Kiang and performed a large-scale resequencing of the Kiang and domestic donkey genomes [56]. The findings show that Kiang and Tibetan donkeys utilize different genes (EPAS1 and EGLN1, respectively) to adapt to low oxygen conditions associated with living at high altitude. Both genes, EPAS1 and EGLN1, are the two most important genes for high altitude adaptation in Tibetan and other highland animals, indicating that the number of potential biological pathways involved in high altitude adaptation in mammals may be limited. The results of the comparative analysis revealed that the Tibetan donkeys did not acquire the ability to withstand high altitude through adaptive evolution with the Kiang, and gene introgression between the two is rare. On one hand, it may be because Kiang and Tibetan donkeys’ hybrids cannot reproduce offspring. On the other hand, it may also be because donkeys live on the Tibetan plateau for a short period of time and have limited encounters, making it difficult to generate gene flow. Given their biological similarities, EGLN1 and EPAS1 could provide markers for donkeys bred at other high altitudes in the world.
Hi-C sequencing is based on chromosome conformation capture technology (3C), which applies high-throughput sequencing to capture spatially contiguous fragments across the entire genome and reveals three chromatin conformations in the nucleus. In descending order, the three-dimensional (3D) hierarchical structural units of the mammalian genome are chromosome territory, chromatin compartment A/B, topological associated domain, and chromatin loop. The advancement of high-throughput technologies has led to faster and more efficient ways of obtaining transcriptome data. RNA-seq (a sequence-based approach) has been the dominant technique in transcriptomics since the 2010s, and a recent study reported the transcriptome atlas of 16 Dezhou donkey tissues [57]. The uniqueness and specificity of chromosomal hierarchical units revealed a series of functional properties, such as cell cycle and gene regulation [58]. Hi-C technology is increasingly used in local variety formation and environmental adaptation studies [59]. It was used for haplotype-assisted assembly in the whole-genome range, and the detection efficiency and accuracy of the analysis were high [60]. Hi-C technology-based data could be widely used to identify 3D genomic structural changes due to known genomic rearrangements [61]. Hi-C technology was also used to assist with genome assembly and obtain genomes at the chromosome scale [62]. It was used for the expression regulation of genes and the study of gene function [63].
Previous studies have suggested that the Nubian wild ass and the Somali wild ass may be the ancestors of domestic donkeys. However, these findings only relied on the analysis of mtDNA variability, which only represents matrilineal inheritance. The genetic basis of the non-Dun phenotype, a pigmentation pattern that may have been selected during domestication or early post-domestication, was investigated by elucidating the history of donkey domestication through a population genomic approach that was based on whole-genome sequence data [64]. The current knowledge of donkey evolutionary history remains incomplete in the absence of archaeological and genome-wide diversity data. Thus, researchers used Hi-C technology to assemble a chromosome-level reference genome of a male Dezhou donkey from scratch and re-sequenced the genomes of 126 domestic donkeys and seven wild asses. The reduced level of Y chromosome variation was found to be inconsistent with the autosomal data, and the paternal and maternal genetic histories differed, possibly because of reproductive management. In addition, the typical staining dilution of the brown phenotype showed very similar microscopic and macroscopic features in horses and donkeys. More importantly, the same TBX3 gene was responsible for the pigmentation of the brown phenotype in both species, whereas the mutation in the non-brown phenotype in donkeys was caused by a 1 bp deletion, which may have a regulatory role.
Hi-C technology has more advantages than the de-novo assembly from short-read data. First, Hi-C works through the spatial distance on the chromosome and the linear distance of the different resulting interaction frequencies to complete the chromosome localization. Thus, building a population is not needed because a single individual could be able to achieve chromosome localization. Second, greater marker density and more complete sequence localization could be achieved through Hi-C technology. It could generally be 90% of the above genomic sequences localized to chromosomes. Third, correcting the errors in the assembled genomic sequences is possible through the magnitude of the interaction frequency between scaffolds.
Donkey genome assembly was first performed by Huang et al. at Inner Mongolia University, China, in 2015 [53]. Sequence reads from double- and single-ended libraries were first assembled into contigs and scaffolds by using newblerv2.8. Then, longer scaffolds were constructed using SSPACE software and 37 paired library information. Finally, Gapcloser scaffolds were used to fill the gaps inside. After reassembly, the donkey genome sequence was obtained, and it consisted of 2166 scaffolds (>1 kbp), with a total size of 2.36 Gb and N50 sizes of 66.7 kb and 3.8 Mb for scaffolds. Whole-genome sequence analysis is a delicate and powerful method to study chromosome evolution. The donkey and Asian wild-ass genomes complement the reference genomes of the genus Equus. Comparative analyses of sequences based on these genomes could provide important insights into the population history and adaptive evolution of horses. Furthermore, these results may strengthen the understanding of chromosomal rearrangement mechanisms (using Oxford Nanopore and Pacific Biosciences technologies) and these data may contribute to studies of equine genomics and mammalian chromosome evolution.
In 2018, Renaud et al. used emerging technologies to perform genome assembly in donkeys [55]. The difficulty with genome assembly has always been the translation of relatively short reads into longer scaffolds, with new sequencing technologies generating longer reads, often accompanied by error rates of up to 15% (e.g., Oxford Nanopore and Pacific Biosciences) [65]. Thus, single-molecule sequencing is often integrated with short reads generated by the Illumina platform to generate so-called hybrid de-novo assemblies to correct these error rates while maintaining cost effectiveness. The alternative approach, which uses long-range chromatin interactions to capture widely spaced read pairs in the genome, coupled with a custom assembly pipeline (HiRise), has been shown to produce long scaffolds at the sub-chromosomal level with low error rates (e.g., Chicago Library). When the Chicago HiRise assembly technology was used to produce high-quality genomic assemblies for donkeys, the N50 was 15.4 Mb, the N50 for contigs was 140.3 kb, and the scaffolds were four times larger than before. In addition, sex chromosomes were typically more repetitive than autosomes, and the donkey X chromosomes appeared to have undergone several rearrangements, thus limiting the N50 of the X chromosome scaffold assembled here to 0.57 Mb. This new donkey combination could be utilized by identifying the ROHs resulting from the correlation between low effective population size and ancestry, exploring chromosome rearrangements and their effect on the presence of distance patterns between donkeys and horses.
In 2020, Wang et al. constructed a de-novo assembly of the donkey genome by using cutting-edge methods [64]. This approach used Illumina short reads and PacBio long reads to create a hybrid de-novo assembly. The resulting Illumina contigs were combined with PacBio reads, and valid reads were extracted based on the results of HiC-Pro. 3d-DNA software was used to assemble chromosome-length genomes and combined with the draft PacBio assembly genome. Combination of the Hi-C data with information from the analysis of covariance among the donkey, horse, and human genomes inferred that the donkey genome is distributed in 30 autosomes, two sex chromosomes (X and Y), and one mitochondrial ring chromosome. Several evaluation methods have demonstrated high-assembly quality, showing 24-fold [53] and sixfold [55] improvements in scaffold N50 compared with previously reported donkey genomes. This assembly could facilitate the identification of small chromosomal rearrangements between horses and donkeys to clarify the evolutionary history of the equine species.
With the advancement of science and technology, the discovery of archaeological evidence, the development of molecular genetics, and the perfection of genome sequencing, people have increased systematic and in-depth understanding of the origin and domestication of the donkey. In this paper, the development of research on the origin and evolution of the donkey was analyzed in detail in terms of archaeology, molecular cell studies, genetic material studies, genome sequencing, and assembly, and the advanced techniques were summarized. The domestic donkey originated in Africa and has two main clades, with a single domestication in Africa about 7000 years ago, followed by further expansion in that continent and Eurasia and eventual return to Africa. Undoubtedly, assembling an individual organism by its genome and comparing it with related samples could lead to more accurate results through big data statistics, analysis, and calculation of relevant models. In the era of bioinformatics, many biological experiments could be performed by genome sequencing under the existing conditions, but large-scale and high-quality research in the field of genome assembly is still urgently needed to achieve perfect whole-genome translation, application, and editing.