2. Current Insight on H7N9 Virus and Adaptation to Human Hosts
Shannon entropy 
was used as a generic measure of protein sequence diversity for each aligned overlapping nonamer position of the avian and human H7N9 viral proteomes (Figure 1
). The entropy of a given position represented the number and individual incidence of the different nonamer sequences at the position. The avian H7N9 virus proteins, with an evolutionary history of over 25 years, were markedly diverse. PB1-F2, with substitutions at each of the aligned nonamer positions, was highly diverse, and NS1, NS2, and M2 each had less than 10 completely conserved positions. The more recent human H7N9 viruses (post-2012), in contrast, had relatively few substitutions and contained numerous long stretch of regions of nonamer positions with no substitutions (zero entropy). Nevertheless, despite the limited history, all proteins of the human H7N9 viruses contained regions of nonamer sequence diversity.
Figure 1. Protein sequence diversity of avian and human influenza A (H7N9) viruses. Shannon’s entropy was used as a general measure of protein sequence diversity for each aligned nonamer (nine amino acids) position of the H7N9 avian (upper) and human (lower) virus proteomes. The entropy values indicate the level of variability at the corresponding nonamer positions, with a zero representing completely conserved sites and high entropy values of about 3 or higher marking highly variable sites.
The complexity of protein substitutions associated with avian H7N9 virus infection of humans is revealed by this in silico
finding of 109 A2H substitutions that were selectively present in the initial human H7N9 viruses. The A2H substitutions identified may be as a result of mutation, re-assortment, or recombination, which merit further investigation. Each was the most prevalent, major variant substitution at a given nonamer position of the aligned avian H7N9 viruses that was adapted as the most prevalent index sequence at the corresponding nonamer position of the aligned human H7N9 viruses, with an incidence of 100% before the onset of change. About one-half of the 109 A2H substitutions were long-standing in the historical evolution of H7N9, as previously reported in phylogenetic studies 
. Thus, although possibly required, they were not sufficient for human infection and can be considered to be adventitious selections with respect to the human host. Moreover, many (59) of the original 109 substitutions were replaced to some extent by sequence changes in the H7N9 viruses recovered from infected humans. For example, the substitution Q235L, known to be selective in human viruses for receptor specificity to human α2,6 sialic acid 
, was replaced by unique variants in two human strains.
Three of the A2H substitutions (HA: Q235L (H7 numbering); PB1: I368V; and M2: S31N) were reported by Gao et al. to be present in influenza A (H7N9) viruses associated with the first three human infections (A/Shanghai/1/2013; A/Shanghai/2/2013; and A/Anhui/1/2013) in China, early 2013, which were fatal. Nine matched to 68 human adaptation signature sites identified from several subtypes (H1N1, H2N2, H3N2, H5N1, H9N2, H6N1) by Miotto et al. 2010 
. Additionally, experimental findings of the CDC weekly report 
noted the two HA amino acid residues, 186V and 226L/I in H3 numbering (177 and 217 in H7 numbering) and PB1-368V, are likely to increase human receptor binding and enhance transmission to humans 
Possibly, only the 50 A2H substitutions present in all human H7N9 viruses in the 2014 dataset may be essential for human adaptation. Notably, 17 of these 50 were first recorded in 2013. These 17 substitutions were particularly abundant in two proteins: the M1 matrix protein that mediates nuclear export of viral RNA segments 
and is thought to initiate progeny virus assembly and budding 
, and NS1 that is associated with an increased translational rate of viral mRNAs 
and suppression of the host immune response 
. The data suggest that screening of animal influenza A viruses for threat of crossing to humans should not be limited to only the surface proteins.
Multiple avian species were the host origin of the 109 A2H substitutions associated with the 2013 human-adapted H7N9 viruses. While the chicken contained the largest fraction of avian viruses with the 109 A2H substitutions, five other hosts (domestic duck, pigeon, wild pigeon, homing pigeon, and tree sparrow) contained a few reported H7N9 viruses with all or nearly all of the 109 H7N9 A2H substitutions. Remarkably, these hosts represent several unrelated avian families (Anatidae, Columbidae, and Passeridae), besides the chicken (Phasianidae). All H7N9 A2H substitutions from viruses of these five hosts were reported in 2013, and where data were available, the substitutions were not present in reported viruses of the same host prior to 2013, suggesting that adaptation to the chicken, pigeon, and tree sparrow accompanied the adaptation to humans. We hypothesize that the root cause for the genesis of the A2H substitutions in the chicken host in 2013 was also responsible for its distribution in other avian species. Unfortunately, information on the species evolution is limited for lack of data, particularly for the chicken, as no sequence data of H7N9 viruses from chickens were available prior to the year 2013.
The internal genes of H7N9 are thought to be derived from avian H9N2 viruses, while the HA and NA genes are from unknown avian H7N?/H?N9 viruses of Eurasian origin. The majority of the H9N2 sequences before 2013 exhibited A2H substitutions. This trend continued with the 2013 onward sequences. This supports the notion of H9N2 being the origin for the internal genes with the possibility of subsequent changes bringing about the additional substitution.
Prior to 2013, from as early as 1988 to 2011, H7N9 viruses of a few avian hosts (ruddy turnstone, blue-winged teal, turkey, Eurasian teal/Anas crecca
, guinea fowl, goose, and wild duck) exhibited limited (11, collectively, Figure 2
) A2H substitutions. In the genesis of H7N9, domestic ducks have been proposed to act as key intermediate hosts, facilitating the generation of different subtype viruses, and transmitting them to chickens 
. H7N9 viral sequence data from domestic duck prior to 2013 were only available for the years 2008 (three HA and one NA; all from Mongolia), 2009 (11 full-length viral genome sequences, all from Jiangxi, China), 2010 (one HA, Mongolia), and 2011 (one HA, Gunma), all of which did not exhibit any of the A2H sites. The A2H sites were only mapped in the available viral genomes of domestic ducks (two isolates, Anhui and Zhejiang, China) starting in 2013, which is the same year they were observed in chickens. Although seven (collectively) of the A2H substitutions were missing from one of the two domestic duck viral genomes of 2013 (Figure 2
), all the seven, except two (S409N in PA and P212S in NS1; Figure 3
), were also missing in more than one strain of chicken viruses, as well as human viruses. The two A2H substitutions were either missing in chicken or human viruses.
Figure 2. Heat map depicting the distribution of the 109 identified avian-to-human (A2H) substitution sites (rows) of publicly reported, full-length, avian and human influenza A (H7N9) virus strains (columns). The identified A2H amino acid (a.a.) substitutions are sorted according to the influenza A virus segments. The distribution is shown with red representing the presence of the A2H a.a. substitution (human index), white for avian index, and grey for strains that exhibited neither (i.e., other variants) or the presence of a gap at the respective position. Eurasian teal is referred to here with the scientific name Anas crecca. Do note that for the strain A/Goose/Czech Republic/1848_K9/2009, the complete proteome sequence was taken from FluDB, while for the other strains, the PA-X sequence was from FluDB and the other proteins were from GISAID. Full-length strains that could not be ascertained by the accession were ignored.
Avian-to-human (A2H) substitution identified in the proteins of influenza A (H7N9) viruses. The amino acid positions of the A2H substitutions are indicated in the circles, and those underlined are the 50 that remained unchanged in the recorded human H7N9 population. The circles in green shade are substitutions that occurred in the evolutionary path of A (H7N9) viruses 
; while those in yellow were first detected in 2013. The protein numeration is based on protein sequence alignment. Abbreviations: RdRp CS, RdRp catalytic subunit; HA, hemagglutinin; VS, virion surface; MB, membrane binding; RNPB, ribonucleoprotein binding; NLS, nuclear localization signal; SAMP (III), signal-anchor for type III membrane protein; IV, intravirion; THF, transmembrane helical fragments; RNABH, RNA-binding and homodimerization; CPSF4B, cleavage and polyadenylation specificity factor 4 binding; and NES, nuclear export signal.
The substitution T401A in the second sialic acid-binding site of neuraminidase (NA) protein, which is an important factor in the haemagglutinin–neuraminidase receptor balance 
, is indicated to enhance catalytic activity, functionally mimicking the substitutions of avian-derived influenza A viruses that became pandemic in humans 
. This substitution was observed in all the full-length strains of human, chicken, wild pigeon, tree sparrow, pigeon, homing pigeon, and domestic duck (Figure 2
). Phylogenetic analyses revealed that the substitution T401A occurred prior to those in hemagglutinin (HA), suggesting that the substitution may have facilitated the acquisition of altered HA receptor-binding properties and contributed to the spread of the novel H7N9 viruses, which still continue to pose a public health threat.
We speculate that H7N9 chicken viruses prior to 2013 did harbor a number of the 109 A2H substitutions, given that at least 12 other hosts did exhibit a few. The 109 A2H substitutions, however, were completely absent from reported 2008-2011 H7N9 viruses of domestic ducks, a species proposed as a key intermediate host in transmitting to chickens 
. Given that 2013 H7N9 viruses of domestic ducks closely mirrored the distribution of A2H substitution in chicken viruses of the same year, it is likely that 2008–2011 H7N9 chicken viruses also closely mirrored the absence of A2H substitutions. It is quite possible that domestic ducks and chickens started exhibiting the A2H substitutions from 2011 onward, leading up to the emergence of the 2013 H7N9 strain. This may have particularly involved about one-half of the 109 A2H substitutions that were long-standing in the historical evolution of H7N9; only 17 of the A2H substitutions were first reported in 2013. Nevertheless, the available data indicate that several avian hosts now possess greater potential for human H7N9 infection if additional substitution(s) enhance the fitness and frequency of the A2H substitutions. These findings call for wider surveillance of the avian host species, particularly domestic ducks given their extensive farming.
The widely reported PB2 E627K substitution 
of H7N9 and other human influenza viruses, important for the enhancement of replication, is not reported herein as an A2H substitution because it did not conform to the common pattern of an avian major variant selectively adapted as the corresponding human index substitution. The E627K substitution is found in avian species only as a unique variant (incidence ~1%) of the tree sparrow, whereas it is the dominant sequence in human hosts (incidence ~68%), likely as a result of subsequent sequence changes of the infecting virus in humans rather than the avian host 
Despite the short evolutionary history of the human H7N9 viruses, there is rapid and continued fitness evolution of the virus in human hosts. In this study, over 200 human H7N9-specific substitutions, not present in the avian H7N9 viruses, were identified. Several were adjacent to or overlapping the positions of the A2H substitutions. In the absence of human-to-human transmission, there is little selective pressure for the proliferation of the human virus strains.
The evolution of the 109 substitutions was analyzed by comparing the 2014 datasets (avian and human) with the much larger 2017 datasets (avian and human). Only seven of the original A2H substitutions remained in the 2017 sequences, with two that were newly identified. The absence of the 102 substitutions does not represent that they are lost, but rather, that the originally selected major variant substitutions of the avian viruses have further adapted in avian hosts and have become widespread in the population as the index of the avian H7N9 sequences. Thus, in the recent 2017 dataset, many of the 2014 major substitutions had become the index in both avian and human viral strains, and hence the lack of apparent selection between the two viral populations. This observation was not restricted to viral strains of chicken, which were predominantly sequenced, and thus a potential bias, but extended to other hosts. The sub-clustering among the 2013 onward strains indicates further evolution and possible adaptation into multiple lineages. These results highlight the need for stratification of viral sequence data in a time-series fashion as a better strategy for the identification of A2H substitutions and understanding the transmission patterns.
In summary, the data indicate a remarkably rapid and continued A2H fitness evolution of the avian H7N9 viruses in avian hosts (chicken, domestic duck, pigeon, wild pigeon, homing pigeon, and sparrow), in particular the chicken. This correlates with the progressive increase in the number of people infected by the virus since 2013 
, with annual epidemics of human infections increasingly reported in China, where it experienced it's fifth (October 2016 to September 2017) and largest epidemic (766 infections) 
, which was followed by the sixth epidemic 
. As essentially all chickens in China are now possibly hosts of the human H7N9 strain, the exposure of humans to chickens should be limited, with continued surveillance, as necessary steps to monitor, curtail and/or prevent further spread and the possible emergence of new lineages capable of human-to-human transmission.