Adaptive molecular evolution of virulence genes of avian influenza - A virus subtype H5N1: An analysis of host radiation

The phenomenon of host radiation is strongly influenced by the rates of mutation of their virulence genes. We have studied the molecular evolution of virulence genes (HA, NS, PB2) of the Avian Influenza Virus H5N1 from avian to human hosts. We used a site-specific comparison of synonymous (silent) and non-synonymous (amino acid altering) nucleotide substitutions for the three chosen genes in parasite populations from different hosts. Analyses were made using Maximum Likelihood (ML) genealogies for the null and alternate hypothesis based on differential gamma distribution rates. The null hypothesis had a higher rate of substitution and was found to be more suitable for all the studied genes by Likelihood Ratio Test (LRT). The study showed the NS gene to be having the fastest rate of evolution.


Background:
A common phenomenon in parasite evolution is their ability to expand their niche to new host species. This process of acquiring of new host ranges is called host change event or host radiation. The selective force on a particular parasite gene may change after host radiation due to reasons like adaptive alteration of protein functionality or host's immune-mediated selection. Since a host change event comprises three stages namely transmission to new host species, replication within new host and transmission between individuals of the new host species, with the last two steps being rate limiting, host radiation events are rarely successful. According to the general theory of 'ecological specialization' the host specific adaptations act as ecological barriers.
[1] These host change events though rare, do occur and may be detrimental to the new host, some eminent examples of which are the Spanish Flu pandemic and the latest H5N1 outbreaks.
The avian influenza A virus is commonly found in most wild birds especially migratory water fowl and wild ducks which act as their natural reservoirs and carriers. It is however fatally infectious to domestic birds like chicken, duck, turkeys etc. There are many subtypes of the virus differing in the combination of subtypes of HA (Hemagglutinin) and NA (Neuraminidase) surface proteins. Human infections were first reported in Hong Kong (1997) with the latest one being the H5N1 outbreak.
[2] Though the outbreaks were local and man to man transmission was rare, the highly mutable nature of influenza virus raises concerns. This is due to the lack of 'proofreading' mechanisms and repair of errors that occur during replication. Little or no immune protection among human population due to lack of prior infection, may lead to pandemic outbreaks. [3] Past influenza pandemics have led to high levels of illness, death, social disruption and economic loss. There were three major influenza-A pandemics during the 20 th century, namely (1) Spanish flu (H1N1) 1918-19, (2) Asian flu (H2N2) 1957-58, and (3) Hong Kong flu (H3N2) 1968-69, which caused thousands of death in the United States. Of the various subtypes, H5N1 is of greatest pandemic concern currently because of the following reasons: rapid spread throughout poultry flocks in Asia; endemic outbreaks in eastern Asia; highly rapid rate of mutation; propensity to acquire genes from viruses infecting other animal species; causes severe disease in humans with a high fatality rate of approximately 70 %.

[4]
Studies on the genetic basis revealed that three genes of avian influenza virus are responsible for their virulence. These are HA, NS and PB2 genes. HA (hemagglutinin) gene is responsible for high cleavability of the hemagglutinin glycoprotein. It is like glue that binds to the sialic acid on cell receptors and hampers its function, NS (non structural) gene antagonizes the induction of interferon protein levels and leads to lethal viral infection [5] and PB2 gene encodes an internal polymerase that influences the outcome of infection. [6, 7] Another study showed that a mutation at position 627 in the gene PB2, which identifying positions in the parasite genome underlying the phenotypic differences between hostspecific strains may give insights about the molecular basis of species-specific adaptation. The fundamental objective behind our work is to study how selection acts on variants of genes responsible for virulence i.e. HA, NS and PB2 genes of different hosts comprising birds and humans. To do so, we use a site-specific comparison of synonymous (silent) and non-synonymous (amino acid altering) nucleotide substitutions in the parasite populations from different hosts. Methods for performing the analyses on a site-specific level have focused on amino acid conservation as an indication of protein function. The purpose of the work is to gain a better understanding of the evolutionary processes in H5N1 avian influenza virus that has undergone host radiation from birds to humans.

Methodology:
The assumption behind this approach is based on functional constraint i.e. functionally important residues and sequences are under stronger selective constraints that lower their evolutionary rates. Investigation of changes in evolution was done by developing a likelihood ratio test based on Markov model of codon substitution for detecting significant rate shifts.
Our work is based on the model proposed by Goldman and Yang. [8] In this model Markov process is used to describe substitutions between codons and transition/ transversion rate bias and codon usage bias are allowed. Further selective restraints at the protein level are accommodated using physicochemical distances between the amino acids coded for by the codons.
The utility of the model is illustrated on a data set of virulence gene sequences from the influenza A virus. The sample of coding sequences from homologous genes responsible for virulence was taken from influenza A virus of strain H5N1 which infects two different host types, birds and humans. A program 'codeml' of the package was used to find the base frequencies and different codon usage. All the statistical analyses were done by statistical software 'OriginPro 7.5 SRO' from Origin Lab Corp. USA.

Discussion:
Analyses were performed using the genealogies estimated by maximum likelihood. Results of the analysis of both hypotheses for the three genes are shown in table 1. The hypothesis H 1 was approximated by a Gamma distribution of rates among the sites where as the hypothesis H 0 was approximated by a Gamma distribution plus a class of invariant sites. The test statistics can be given by:

Gene responsible for virulence
Where U is the log likelihood ratio for the models and L 1 and L 0 are the log likelihood values for hypothesis H 1 and H 0 respectively. Because H 1 is a special case of H 0 (the hypotheses are nested), the likelihoods will always obey the relationship that L 1 ≤ L 0 . This means that U will never be negative. Minimum probability 'p' of not rejecting H 0 , is given by the equation : U≤ (1 -α) x 100% (where α equals the probability of rejecting H 0 ).
The test showed that the hypothesis H 0 gives a better fit than hypothesis H 1 . Besides the H 0 model having an additional class of invariant sites is responsible for faster rate of evolution. Further the probability of not rejecting hypothesis H 0 is highest in case of the NS gene followed by that of HA and PB2. Thus we can infer that NS has the fastest rate of evolution and seems to be most significant for molecular adaptation of the parasite. This was also confirmed by determining the trees generated for these genes by a maximum likelihood algorithm. The trees for the different genes are shown in fig-1, fig-2 and fig-3 for NS, HA and PB2 gene respectively.  Table 3: Values of standard deviation of base frequency for different codon position and different bases *SD = Standard deviation The result also showed that the standard deviation for substitution of bases of NS gene was highest among all the three considered genes which shows and confirms that NS gene is the most evolving gene followed by HA and PB2. These results were also confirmed with the simulation studies for Markov Chain Monte Carlo samples based on a total of 19502 samples from 2 runs. Each run produced 10001 samples of which 9751 samples were included (data not shown).

Conclusion:
We found out that during the process of molecular evolution of avian influenza virus from birds to humans, the most important gene responsible for causing virulence in humans is NS followed by HA and PB2. Our motivation here was to study the evolution of virus undergoing host radiation, but the study addresses the more general problem of describing the substitution process in two groups of related organisms. The codon-based approach which we used uses a comparison between the two different types of nucleotide substitution, thus enabling use of the full information in the nucleotide sequence and also includes the known biological phenomenon of transitiontransversion bias. However the gamma parameter used in the study is a very crude indicator and the indicators such as, e.g., shifts between biochemically different groups of amino acids etc. can improve the study. We have assumed that the rates of synonymous and nonsynonymous substitution scale identically with changes in the mutation rate after a viral host change because of alterations. This scaling can be hold for sites where all amino acid substitutions are either neutral or strongly deleterious. However, for sites undergoing positive selection if the process of fixation is limited by factors other than the availability of mutations, this may not be the case. We believe that our study may prove to be useful to identify candidate genes and codons for the molecular biological investigation of species-specific adaptation in viruses.