Garlic (Allium sativum L.) plants exhibiting mosaics, deformation, and yellow stripes symptoms were identified in Meerut City, Uttar Pradesh, India. To investigate the viruses in the garlic samples, the method of high-throughput sequencing (HTS) was used. Complete genome of the garlic virus E (GarV-E) isolate (NCBI accession No. MW925710) was retrieved. The virus complete genome comprises 8450 nucleotides (nts), excluding the poly (A) tail at the 3′ terminus, with 5′ and 3′ untranslated regions (UTRs) of 99 and 384 nts, respectively, and ORFs encoding replicase with a conserved motif for RNA-dependent RNA polymerase (RdRP), TGB1, TGB2, TGB3, serine-rich protein, coat protein, and nucleic acid binding protein (NABP). The sequence homology shared 83.49–90.40% and 87.48–92.87% with those of GarV-E isolates available in NCBI at the nucleotide and amino acid levels, respectively. Phylogenetic analysis showed a close relationship of this isolate from India (MW925710) with GarV-E isolate YH (AJ292230) from Zhejiang, China. The presence of GarV-E was also confirmed by RT-PCR.
1. Introduction
Garlic (
Allium sativum L.; Family:
Amaryllidaceae) is an aromatic bulbous crop native to central Asia and is consumed worldwide as food in addition to traditional remedies for various diseases
[1]. It is highly prone to viral infection, which has adversely reduced bulb weight
[2]. Garlic crops are often infected by multiple viruses belonging to several genera that are known as the “garlic virus complex”
[2]. Many viruses infecting garlic have been identified in India, including
potyvirus (onion yellow dwarf virus; OYDV, leek yellow stripe virus; LYSV),
carlavirus (garlic common latent virus; GarCLV, shallot latent virus; SLV),
tospovirus (iris yellow spot virus; IYSV),
allexivirus (garlic virus A; GarV-A, garlic virus B; GarV-B, garlic virus C; GarV-C, garlic virus D; GarV-D and garlic virus X; GarV-X)
[3][4][5][6][7][8][9][10][11][12]. Garlic virus E (GarV-E) belongs to the single-stranded, positive-strand RNA virus of the genus
Allexivirirus and family
Alphaflexiviridae. It was previously reported in garlic from China, Poland, Australia, the USA, and Japan
[13][14][15][16]. These viruses are reported to be transmitted by an insect mite vector
[17], vegetative propagation, and mechanically
[18]. The disease symptoms include leaf mosaic, deformation, and yellow stripes, which reduce yield and deteriorate the quality of the crop. Because of vegetative propagation, these viruses can accrete in the bulb and can be transmitted to successive generations. Hence, the eradication of these viruses becomes onerous. Considering the importance of garlic, identification, characterization of the virus associated with the disease, and an appropriate management strategy are required
[19].
To date, in the public domain, only four complete genomic sequences of GarV-E isolates have been reported from China
[20][21]. Several studies have reported genetic differences based on coat protein (CP) sequences within
Allexivirus species
[21], and there are currently 15 partial CP/NABP sequences of GarV-E submitted globally available in the NCBI database
[16][22].
2. Sequence Analysis
To reveal viruses that might be associated with the symptoms, the RNA of pooled symptomatic clove and leaf samples was sequenced using the Illumina HiSeq 2000 platform. The size of the Illumina sequencing data generated was approximately 43 million 125 bp paired-end reads in the two libraries. After trimming 34,873,264 bp (average length 124.54 bp) and 31,494,032 bp (average length 124.57 bp), raw sequence reads were obtained. A total of 133,971 and 108,668 contigs were generated from clove and leaf samples of garlic, respectively. All contigs were subjected to a BLASTn search against the nr database, which revealed whole-genome sequences of GarV-E apart from other garlic viruses, including potyvirus (onion yellow dwarf virus; OYDV, leek yellow stripe virus; LYSV), carlavirus (garlic common latent virus; GarCLV, shallot latent virus; SLV), and allexivirus (garlic virus A; GarV-A, garlic virus B; GarV-B, garlic virus C; GarV-C, garlic virus D; GarV-D and garlic virus X; GarV-X). Sequence taxonomic profiling was visualized using a Krona graph, and sequence reads belonging to Allexivirus, GarV-A (14%), GarV-B (5%), GarV-D (36%), GarV-E (18%), and GarV-X (27%), were obtained. The sequence mapping, BLAST analysis, and Kraken approach generated the complementary datasets, which were supported with 100% convergence. Reference-based mapping of the data revealed that the viral reads mapped to GarV-A, GarV-D, GarV-E, GarV-X, OYDV, LYSV, and GarCLV in both of the samples. The obtained data revealed that 100,184 (0.33%) reads from clove and 510,948 (1.95%) reads from leaf sample mapped with the GarV-E genome.
3. Genome Annotation and Analysis of Garlic Virus E
BLASTn program-based analysis showed that the GarV-E contig comprises the complete genome sequence of 8450 bp ssRNA. The 5′ UTR and 3′ UTR sequences were not included in the study, and the genome sequence obtained in the study was deposited to NCBI with accession number MW925710. In addition to exploring the amino acid sequence in all possible open reading frames (ORFs), it was viable to detect the characteristic domains along with conserved motifs specific to the genus
Allexivirus [13][23][24][25]. The ORF Finder and smart BLAST tool revealed that ORF1 encodes replicase (4671 nt; 1556 aa) with a conserved motif SG×3T×3NT×22GDD, which is the proposed active site of the RNA-dependent RNA polymerase (RdRP), was found at amino acid positions 1317–1353, ORF2 a TGB1 (735 nt; 244 aa), ORF3 a TGB2 (309 nt; 102 aa), ORF4 a TGB3 (225 nt; 74 aa), ORF5 a serine-rich protein (234 nt; 77 aa), ORF6 a coat protein (759 nt; 252 aa), and ORF7 an NABP (348; 127 residues) (
Figure 1). The pairwise sequence comparison at the level of nucleotides and deduced amino acids of all seven ORFs revealed 72.8–98.3% and 80.5–98.7% identities, respectively, with the Chinese isolates (AJ292230, MN059326, MN059327, and MN059328) (
Table 1).
Figure 1. Genomic organization of GarV-E (MW925710) showing seven predicted open reading frames and their corresponding products: replicase, TGB1, TGB2, TGB3, serine-rich protein, viral coat protein (CP), and nucleic acid binding protein (NABP).
Table 1. Pairwise percent sequence identity of Indian GarV-E Isolate (MW925710) at the nucleotide (nts) level and its deduced amino acid (aa) sequence of ORFs with other complete genomes of Allexiviruses (for which complete genome sequences are available).
4. Sequence Similarity and Phylogenetic Analysis
BLASTn
[26] searches of the NCBI databases showed that the complete genome of GarV-E isolate India (MW925710) shared 83.49–90.40% nucleotide sequence identities with previously reported isolates (AJ292230, MN059326, MN059327, and MN059328). Moreover, the Indian isolate was more closely related to isolate YH (AJ292230) (90.40%) from Zhejiang, China. A similar result was obtained using an NJ-based phylogenetic tree of the complete genome sequence of GarV-E with other complete genome sequences of GarV-E and other
Allexivirus species from different regions of the world.
In this entry, ingroups were selected from the same species from different countries based on the closely related complete genome, and complete CP sequences to the respective viruses and outgroups were selected from the Allexivirus genus virus containing enough homologous sites to the respective ingroup virus species to assess the evolutionary relationship. The phylogenetic tree revealed that GarV-E Indian isolates (accession no. MW925710) grouped in the same clade as other GarV-E isolates reported from other countries (Figure 2). Similar phylogenetic tree results were obtained at amino acid (aa) level. The pairwise sequence identities (%) of the GarV-E complete genome sequence (MW925710) shared nucleotide (nt) identity at 79.80–90.10% and amino acid (aa) identity at 79.90–89.1% with other GarV-E isolates reported globally (Table 2).
Figure 2. Phylogenetic analysis of GarV-E isolates in the complete genome amino acid sequence using Neighbor joining algorithm. The evolutionary distances were computed using p-distance method with 1000 bootstrap replicates. The scale bar indicates the number of substitutions per site.
Table 2. Comparisons of nucleotide sequence (nts) and amino acid (aa) identity of pairwise combinations of complete genome sequences of garlic virus E (accession no. MW925710) with other complete genome sequences of Allexiviruses. (for which complete genome sequences are available).
The results of the phylogenetic tree constructed with nucleotide sequences of the complete coat protein (CP) gene available in NCBI for the genus
Allexivirus were consistent with the results obtained for the complete genome. Previously, in many of the assessments between members of diverse species of
Allexivirus, the percent nt and aa sequence identities of the CP gene showed values greater than those suggested by ICTV. This was also identified by
[21], who proposed that GarV-A may be combined with other viruses, such as GarV-D and GarV-E, providing more than 73% nt identity among the CP genes to become acceptable for GarV species characterization. Similarly, GarV-B may be combined with GarV-X
[21]. The comparison of the CP gene of GarV-E Meerut India (MW925710) with similar sequences shared nucleotide (nt) identity at 83.3–90.6% and amino acid (aa) identity at 86.5–92.8% with other GarV-E isolates reported globally (
Table 1).
Out of 16 samples, seven, including samples used for HTS, were found to be positive for virus infection using RT-PCR with an amplicon of ~750 bp. The sequences obtained were deposited to NCBI with accession numbers MW925695, OK064618, OK064619, OK064620, and OK064621. BLASTn analysis of the partial CP/NABP gene of GarV-E India (MW925695) shared 83.63–92.96% nucleotide identity with other isolates available in NCBI. The pairwise sequence identity comparison of the partial CP/NABP gene of GarV-E India (MW925695) with similar sequences shared nucleotide (nt) identity at 83.5–92.9% and amino acid (aa) at 82.5–96.7%. Moreover, the Indian isolate shared a high sequence identity with E-JF-2 isolate (LC097189) (90.40% nt and 96.7% aa) isolated from Fukuoka, Japan. To better understand the genetic variability of the GarV-E isolates, researchers selected 18 partial CP/NABP coding region sequences of GarV-E and other Allexivirus from different geographical locations to construct a phylogenetic tree. In the phylogenetic tree, the GarV-E partial CP/NABP India isolate (accession no. MW925695) was grouped in the same clade as other GarV-E isolates reported from other countries.
NJ-based phylogenetic analysis of the complete genome of the virus (
Figure 2), complete coat protein region, and partial CP/NABP coding region showed consistent clustering of isolates. They all suggest a close relationship between the Indian GarV-E isolate and other GarV-E isolates, supported by high posterior probability values. Recombination appears to be rare in single-stranded, negative-sense RNA viruses, although for those with segmented genomes, such as influenza A, a genetic exchange can still occur through reassortment
[27]. Researchers did not find any strong signatures of recombination by RDP4 in individual alignments of the Indian GarV-E isolate (data not presented).