Wild animals harbour a large number of adenoviruses that remain uncharacterised with respect to their genomic organisation, diversity, and evolution within complex ecosystems. Here, we discovered the first complete genome sequence of an atadenovirus from a passerine bird that is tentatively named Passerine adenovirus 1 (PaAdV-1). The PaAdV-1 genome is 39,664 bp in length, which was the longest atadenovirus to be sequenced, to the best of our knowledge, and contained 42 putative genes. Its genome organisation was characteristic of the members of genus Atadenovirus; however, the novel PaAdV-1 genome was highly divergent and showed the highest sequence similarity with psittacine adenovirus-3 (55.58%). Importantly, PaAdV-1 complete genome was deemed to contain 17 predicted novel genes that were not present in any other adenoviruses sequenced to date, with several of these predicted novel genes encoding proteins that harbour transmembrane helices. Subsequent analysis of the novel PaAdV-1 genome positioned phylogenetically to a distinct sub-clade with all others sequenced atadenoviruses and did not show any obvious close evolutionary relationship. In contrast to the previous studies where authors proposed the reptilian origin of atadenoviruses, our resulting tress consistently demonstrated that atadenoviruses might have first evolved in a bird species that was present before the passerines and psittacine clades separated and were the ancestor of both clades. Further investigations on the structure and function of its major proteins and extended studies on closely related species can be suggested to broaden the knowledge in host specificity for adenovirus infection.
Adenoviruses are medium-sized, non-enveloped, linear, double-stranded DNA (dsDNA) viruses within the family Adenoviridae . The family Adenoviridae contains five accepted genera . One of these genera, the Atadenovirus, was added in 2002, to include adenoviruses that were previously assigned to the genus Mastadenovirus, but varied significantly based on genomic size, structure, genes, and gene arrangement. Originally all the viruses in the genus were thought to have an A+T content bias , but with the discovery of new species in this genus, it has been shown that the A+T bias is not a consistent feature of this genus . The size of sequenced atadenovirus genomes ranges between 27 and 34 kb, and all have the characteristic inverted terminal repeat (ITR) found in all adenoviruses . Atadenoviruses have a set of core genes shared with the other adenovirus genera plus genus-specific genes. A feature of adenoviruses is their ability to acquire genes from their hosts, bacteria, fungi, and other viruses. The Atadenoviruses appear to be particularly adept at this and all atadenoviruses sequenced to date contain five or more genes acquired from other organisms or whose origin is not known . These genes are diverse in their function and appear to be lost as often as they are acquired as the atadenoviruses evolve.
Atadenoviruses have been detected in a diverse range of hosts, including birds , reptiles (order Squamata; lizards, snakes, and worm lizards), ruminants , marsupials , and a common tortoise . Using a partial DNA polymerase gene sequence, recent studies also report the presence of a large number of novel atadenoviruses circulating in wild passerine species of birds in Australia and Europe and passerine species kept in aviculture collections . Because of the limited sequence information for these passerine adenoviruses, there is still considerable uncertainty about their phylogenetic relationship to each other and other atadenoviruses. Additionally, only two avian atadenoviruses, the psittacine adenovirus 3 (PsAdV-3) and duck atadenovirus (DAdV), have been fully sequenced to date .
The assembled passerine adenovirus 1 (PaAdV-1) complete genome was a linear double-stranded DNA molecule of 39,664 bp in length, which was the longest atadenovirus to be sequenced, to the best of our knowledge. Like most atadenoviruses, the PaAdV-1 genome contained a central conserved coding region bounded by two identical inverted terminal repeat (ITR) regions. The length of the ITR varies considerably in other atadenoviruses and ranges from 40 to 194 bp long . The ITR of PaAdV-1 encompassed 193 bp each with the coordinates of 1–193 sense orientation and 39,450–39,642 antisense orientation. The novel PaAdV-1 genome sequence was shown to contain a balanced G+C percentage (53.70%). The known AdV genomes that were most closely related to the PaAdV-1, according to complete genome analysis, were psittacine adenovirus 3 (PsAdV-3; 55.58%), snake adenovirus 1 (SnAdV-1; 55.58%), bearded dragon adenovirus 1 (BDAdV-1; 55.20%), and duck adenovirus A (DAdV-A; 54.25%).
The PaAdV-1 genome had 42 predicted methionine-initiated ORFs encoding proteins that were annotated as putative genes and were numbered from left to right (Figure 1 and Table 1). Comparative analysis of the protein sequences encoded by the predicted ORFs, using BLASTX and BLASTP, identified homologs with significant protein sequence similarity (E value ≤ 10−4) for 25 ORFs (Table 1), while 17 ORFs (ORF01-04, ORF6-18) were found to be unique according to the BLAST database. Among the predicted protein-coding ORFs of the PaAdV-1 genome, 24 were homologs to other AdVs gene products (Table 1). Among these homologues AdVs gene products, the highest number of protein-coding genes (21) in PaAdV-1 demonstrated homologs to the psittacine adenovirus-3 (PsAdV-3). The remaining three genes, encoding E1B-large T-antigen, E4, and a hypothetical protein (RH0, F-box related protein), were homologues to amniota adenovirus 1 (protein identity 25.26%, GenBank accession no. QEJ80749.1), DAdV-1 (protein identity 26.92%, GenBank accession no. AJA72340.1/), and BAdV-D (protein identity 30.15% GenBank accession no. NP_899151.1/), respectively. The gene product of ORF05 was predicted to encode a 92 amino acid (aa) (molecular weight/theoretical isoelectric point, Mw/pI-10.27 kDa/11.16). This protein was predicted to contain a protein homolog to prokaryotic ankyrin repeat domain-containing protein-50 by BLAST search with a >49% (query coverage 61% and E-value: 2.00 × 10−5) amino acid sequence similarity.
Figure 1. Schematic illustration of the avian atadenoviruses. Schematic map of the passerine adenovirus 1 (PaAdV-1, GenBank accession no. MT674683), in comparison with duck adenovirus A (DAdV-A, GenBank accession no. AC_000004) and psittacine adenovirus 3 (PsAdV-3, GenBank accession no. KJ675568), using CLC Genomic Workbench (version 9.5.4, CLC bio, a QIAGEN Company, Prismet, Aarhus C, Denmark). The arrows symbolize adenovirus genes and open reading frames (ORFs) predicted to code for proteins, indicating their direction of transcription. Each gene or ORF is colour coded, as indicated by the colour key in the legend. The bottom graph represents the sequence conservation between the aligned PaAdV-1, DAdV-A, and PsAdV-3 sequences at a given coordinate at each position in the alignment. The gradient of the colour reflects the conservation of that particular position is in the alignment. Red presents 100% conservation across all three viruses, black 50% conserved regions, and blue less than 50% conserved regions.
Table 1. Predicted protein-coding genes of PaAdV-1.
The orientation of the predicted conserved genes in the PaAdV-1 was identical to that of PsAdV-3 and DAdV-A (Figure 1). The left-hand (LH) region of the PaAdV-1 genome contained a genus Atadenovirus specific gene homologue of p32K (Figure 1), followed by genes encoding E1B small T-antigen and E1B large T-antigen that were homologues to PsAdV-3 and DAdV-A, respectively. The amino acid sequence similarity of p32K was relatively low, as compared to other atadenoviruses, ranging from 30.83% to 33.51%, where the highest similarity was demonstrated with SnAdV-1 (33.51). Open reading frames corresponding to conserved LH proteins present in other atadenoviruses were not found in PaAdV-1.
At the centre of the PaAdV-1 genome, all the expected AdVs conserved genes were found, and their degree of homology with other atadenoviruses is shown in Figure 1 and Table 1. The only expected gene that was not found was the U-exon gene. The maximum similarity of individual proteins of PaAdV-1 to homologs in other atadenoviruses varied significantly and was not predictable (Table 2). For example, the DNA polymerase and penton base protein showed the highest pairwise identity with homologous proteins from PsAdV-3, whereas hexon, DNA binding protein, and fibre protein displayed the highest match with DAdV-A. Among the major capsid proteins, fibre protein showed a low amino acid identity, ranging from 15.65% to 25.42%, whereas penton base and hexon exhibited a high amino acid identity with PsAdV-3 (75.94%) and DAdV-A (80.04%), respectively. Furthermore, the additional predicted conserved proteins analysed also demonstrated high identity with homologous atadenoviruses proteins from other species (Table 2).
Table 2. Comparative G+C (%) content and pairwise identity of representative atadenovirus species against passerine adenovirus 1 (PaAdV-1) on the basis of complete genome nucleotide sequences and selected core proteins amino acid sequences. Shading and bold front highlights maximum similarity. Selected proteins were aligned by using MAFFT in Geneious (version 10.2.2, Biomatters, Ltd., Auckland, New Zealand) with the chosen of following parameters to calculate % similarity: scoring matrix BLOSUM62, Gap open penalty = 1.53. Blosum62 with threshold 1 (percentage of residues which have score > =1 in the Blosum62 matrix).
In the right-hand (RH) region of PaAdV-1, ORFs corresponding to four E4 genes were found (Figure 1 and Table 1). Among them, three (E4.1, E4.2, and E4.3) exhibited the greatest amino acid homology with proteins from PsAdV-3 (protein identity ranging between 27% and 35%), whereas one (E4) had the greatest amino acid homology with proteins from DAdV-1 (26.92% amino acid identity). To the right of the E4 region, the PaAdV-1 genome contained only one ORF previously described in atadenoviruses. This ORF codes for a protein of 205 residues that appears to be a homolog of the F-box domain found in BAdV-D (protein identity 30.15%). The RH0 gene was followed by 13 ORFs whose predicted protein products have not been identified in other adenoviruses previously (Figure 1).
The PaAdV-1 genome encoded all the conserved genes present in other adenoviruses, except the U exon. Additionally, it contained 17 novel ORFs (ORF1-4 and ORF6-18, Table 1) that were not present in other adenoviruses sequenced to date, nor did they match sequences in the NR protein database, using BLASTP and BLASTX. These unique ORFs encoded proteins of 33–312 amino acids (aa) in length (Table 1). Among these, the novel ORF8 was predicted to encode a 39 aa-length protein (molecular weight/theoretical isoelectric point, Mw/pI-4.49 kDa/8.96). This protein was predicted to contain a protein homolog to hepatitis E virus capsid protein (PDB: c3ggqA) by Phyre2 with a 74% sequence coverage, but the confidence of predicted protein structure by phyre2 was quite low as 40%. Therefore, there was no good structure predicted by using Phyre2 and SWISS-MODEL.
Four novel ORFs (ORF09, -10, -13, and -18) were predicted to contain transmembrane helices (TMHs), but no classical signal peptide. ORF10 was predicted to encode a 312 aa protein (Mw/pI-35.60 kDa/6.35) containing at least two TMHs. The orientation of the protein in the TMHs is shown in Figure 2. Furthermore, the TMHs detected by EMBOSS in Geneious also showed the presence of alpha-helices (α-helices) within TMHs predicted region, which was dominated by highly hydrophobic residues (red colour) (Figure 2A). However, we were unable to model the structure of ORF10 and or TMHs by using the Phyre2, HHpred, and SWISS-MODEL; this might be due to the lack of closely related structure in the database. Though the function of TMHs and dominant hydrophobic residues in this novel ORF is unknown, studies have shown that hydrophobicity drives the insertion of helical segments into the transmembrane proteins and acts as a hallmark of soluble globular protein tertiary structure . ORF18 was predicted to encode a 237 aa protein (Mw/pI-26.14 kDa/8.81) that was also predicted to have at least two TMHs (Figure 3). TMHs detected in ORF18 by EMBOSS in Geneious also showed the presence of α-helices within TMHs predicted region, and they were also shown to be dominated by highly hydrophobic residues (red colour) (Figure 3A). Similarly, ORF13 was shown to contain a single TMH by TMHMM, TMpred, and EMBOSS tool in Geneious used in this study. The protein encoded by a novel ORF09 (181 aa) was predicted to have at least one C-terminal TMH by several programs, including EMBOSS 6.5.7 tool charge in Geneious, TMHMM and TMpred. However, there was an additional N-terminal TMH detected by Geneious and TMpred, and a further TMH was predicted by TMpred. Nonetheless, there was no evidence for conserved secondary structure and or protein homologs detected by various software, including HHpred , (Phyre2) , and SWISS-MODEL .
Figure 2. Predicted structure of the unique PaAdV1-ORF10. (A) prediction of transmembrane helices (TMHs) in unique PaAdV1-ORF10 gene using EMBOSS 6.5.7 tool in Geneious (version 10.2.2) (A), TMHMM (B), and TMpred (C). All the programs consistently predicted two TMHs. (A) TMHs detected by EMBOSS also showed the presence of alpha-helices within TMHs predicted region that has been dominated by highly hydrophobic residue (red colour). (B,C) The x-axis represents the position of residue, whereas y-axis represents the posterior probability (B), and scores (above 500 are considered significant) (C) for the predicted TMHs. (C) Solid and dashed black lines indicate protein orientation as inside to outside, and outside to inside, respectively.
Figure 3. Predicted structure of the unique PaAdV1-ORF18. (A) prediction of transmembrane helices (TMHs) in unique PaAdV1-ORF18 gene, using EMBOSS 6.5.7 tool in Geneious (version 10.2.2) (A), TMHMM (B), and TMpred (C). All the programs consistently predicted two TMHs. (A) TMHs detected by EMBOSS also showed the presence of alpha-helices within TMHs predicted region that has been dominated by highly hydrophobic residue (red colour). (B,C) The x-axis represents the position of residue, whereas the y-axis represents the posterior probability (B) and scores (above 500 were considered significant) (C) for the predicted TMHs. (C) Solid and dashed black lines indicate protein orientation as inside to outside, and outside to inside, respectively.
Phylogenetic reconstruction based on two non-structural (polymerase and pTP) and two structural (penton and hexon) protein sequences clearly supported the inclusion of the newly assembled PaAdV-1 in the genus Atadenovirus. In the resulting ML tree based on concatenated amino acid sequences of four selected AdVs genes, the novel PaAdV-1 occupied a distinct sub-clade position with strong bootstrap support, when compared with other sub-clades within the genus Atadenovirus (Figure 4), suggesting that it may represent an ancient evolutionary lineage within the genus. Using the same set of concatenated protein sequences, we found that the maximum inter-lineage sequence identity values between the novel PaAdV-1 and other atadenoviruses were >63% (PaAdV-1 vs. DAdV), >62.5% (PaAdV-1 vs. PsAdV-3), and >61.0% (PaAdV-1 vs. ruminant atadenoviruses), which mirrored the distinct phylogenetic position of this novel PaAdV-1. Furthermore, neighbour joining (NJ) phylogenetic inference of the concatenated protein sequences and the ML trees based on individual protein sequences of the complete polymerase, penton, and hexon genes demonstrated similar tree topologies for the representatives of atadenoviruses species. For example, all of these ML trees based on individual genes showed that the novel PaAdV-1 was placed phylogenetically into a distinct sub-clade from all other atadenoviruses (supported by a strong bootstrap) and did not show any obvious close evolutionary relationship. However, ML phylogeny based on the amino acid sequences of the complete pTP gene supported the closest relationship of novel PaAdV-1 with reptilian and bird atadenoviruses. Remarkably, in contrast to the previous studies where authors proposed the reptilian origin of atadenoviruses based on the comparison of phylogenetic trees of the adenoviruses, using a partial DNA polymerase gene , our resulting tress consistently demonstrated that atadenoviruses might have first evolved in a bird species that was present before the passerines and psittacine clades separated and was the ancestor of both clades (Figure 4).
Figure 4. Phylogenetic tree shows the possible evolutionary relationship of novel passerine adenovirus 1 with other selected AdVs. Maximum likelihood (ML) tree was constructed by using concatenated amino acid sequences of the complete DNA-dependent DNA polymerase, pTP, penton, and hexon genes. Concatenated protein sequences were aligned with MAFTT (version 7.450)  in Geneious (version 10.2.2, Biomatters, Ltd., Auckland, New Zealand), under the BLOSUM62 scoring matrix and gap open penalty = 1.53. The gap >20 residues deleted from the alignments. The unrooted ML tree was constructed with PhyML  under the LG substitution model, and 1000 bootstrap re-samplings were chosen to generate ML trees, using tools available in Geneious (version 10.2.2, Biomatters, Ltd., Auckland, New Zealand). The numbers on the left show bootstrap values as percentages, and the labels at branch tips refer to original AdVs species name, followed by GenBank accession number in parentheses. The final tree is visualised with FigTree (version 1.4.4) . The five official genera are highlighted as different background colours, and novel passerine adenovirus 1 is shown in pink colour.