Design of a set of probes with high potential for influenza virus epidemiological surveillance

An Influenza Probe Set (IPS) consisting in 1,249 9-mer probes for genomic fingerprinting of closely and distantly related Influenza Virus strains was designed and tested in silico. The IPS was derived from alignments of Influenza genomes. The RNA segments of 5,133 influenza strains having diverse degree of relatedness were concatenated and aligned. After alignment, 9-mer sites having high Shannon entropy were searched. Additional criteria such as: G+C content between 35 to 65%, absence of dimer or trimer consecutive repeats, a minimum of 2 differences between 9mers and selecting only sequences with Tm values between 34.5 and 36.5oC were applied for selecting probes with high sequential entropy. Virtual Hybridization was used to predict Genomic Fingerprints to assess the capability of the IPS to discriminate between influenza and related strains. Distance scores between pairs of Influenza Genomic Fingerprints were calculated, and used for estimating Taxonomic Trees. Visual examination of both Genomic Fingerprints and Taxonomic Trees suggest that the IPS is able to discriminate between distant and closely related Influenza strains. It is proposed that the IPS can be used to investigate, by virtual or experimental hybridization, any new, and potentially virulent, strain.


Background:
Influenza viruses are part of Orthomixoviridae Family and possess segmented genomes consisting of seven or eight separate RNA molecules, each coding for one or more viral proteins.The viruses can exchange segments, leading to diversity of reassortant strains.Together with accumulation of point mutations, segment reassortment is the basis for evolution and maintenance of diversity for these viruses.It provides them with the ability to rapidly adapt to the pressure of the host immune system and leads to the continuous emergence of new virus variants that cause seasonal and pandemic outbreaks of influenza.Because of this ability, segmented viruses can exist in numerous genotypes and serotypes, presenting a challenge to the creation of protective vaccines and detection methods [1,2].
Because of these reasons, the early detection and diagnostic confirmation of influenza virus infections is fundamental for an appropriate control of the disease.Several molecular biology techniques, most of them based on PCR amplification, have contributed to the diagnostic of the different types and subtypes of influenza virus.However, PCR techniques are frequently unable to detect new potentially virulent strains.Other techniques such as sequencing are able to perform a precise identification of such strains but still are not so widely available for routinary diagnostic [3,4].
The creation of a microarray is complicated when genomic structures are similar.Probe selection is further complicated when the number of known sequences is very large.When this happens the probe selection strategy becomes critical [5].There are several methods [6][7][8][9][10][11][12] for the selection of specific probes for influenza virus detection.Direct search for probes based on traditional computational methods is labor-intensive and often requires plenty of time.The Shannon entropy (H), is a bioinformatics technique that has been used to sort the influenza virus, to analyze the evolution of influenza [13], to facilitate the development of an anti-influenza vaccine [14], and to create a profile of these areas of high variation, observing characteristic patterns for each subtype [15].
In the present approach we designed and tested in silico, an Influenza Probe Set (IPS) which consists in 1,249 probes with a length of 9mer, extracted from sequence alignment zones with maximum entropy within the full viral genome of over

Methodology:
Shannon entropy is a measure of the lack of predictability of an element [19], such as a given base, in a particular position of alignment.Highly variable columns in an alignment will yield maximum values of entropy.

Search Probe
This program developed in Java, calculates the Shannon entropy of aligned sequences.It finds the points having maximum entropy, then, selects 9-mer sequences (the size can be modified by the user), using the point of maximum entropy as the 9-mer center.
The equation used by SearchProbe to calculate the Shannon entropy is: is the entropy at position n, i represents a residue (in this case there are only four possible options A, C, G and U), f (i, n) is the frequency of residue i in the n position.The information content in position n, is defined then as a decrease in uncertainty or entropy in that position.In our particular case, SearchProbe seeks regions with maximum entropy values [18].
CalcProbes.This Perl script refines the search of probes using the 9-mer sequences provided by SearchProbe.These sequences are subject to the next restrictions: i) Select only sequences having between 35-65 %G + C (4 or 5), ii) Eliminate 9-mers having tandem repeats of 2 or 3 nucleotides, iii) Select sequences having a minimum of 2 differences between them and iv) Chose 9-mer sequences having 34 to 36°C Tm values.Tm values were calculated with the thermodynamic Nearest-Neighbors (NN) model using SantaLucia parameters [19].The final 1,249 9-mer probe set selected by this procedure is the IPS (Influenza Probe Set).

Virtual Hybridization (VH)
Virtual Hybridization is a computer program able to predict perfect and mismatched target/probe hybridizations under a selected Tm cutoff value.The stability of target/probe duplexes is calculated with the NN model.This program was used to determine all the hybridizations occurring between each Influenza virus genome, or control strain, and the IPS.The group of hybridization signals produced by each viral genome corresponds to its particular fingerprint [20].

Genomic Fingerprinting Analysis with UFA software
Universal Fingerprinting Analysis (UFA) software transforms genomic fingerprints produced by Virtual Hybridization under any chosen stringent condition, into images.It also allows visual comparison of any selected pairs of fingerprints, producing spots with specific colors for both distinctive as well as for shared hybridization signals.Besides, this tool is able to calculate pairwise distances between pairs of genomic fingerprints.From a table of such distances Taxonomic trees were built using the Neighbor -Joining method with the program MEGA 5 [21].

Distinction of Influenza strains with the IPS
Two types of analysis were performed: I) A Taxonomic tree, based on distances between IPS-Genomic Virtual Hybridization fingerprints, comparing several types of Influenza and other viruses, was made.II) Overlapped images from selected pairs of genomic fingerprints for strains having: low, medium, or high degree of relatedness, were made.Influenza A /mallard duck/New York/170/1982(H1N2) and Influenza A/Mexico/InDRE4487/200 were used as references.

Results & Discussion:
In the first step an average of 550,500 non-unique sequence probes were selected from the alignment.Furthermore probe sequences were clustered in order to remove the repeated ones and to select only those with entropy higher than a convenient threshold (ProbeSearch).Calcprobes is responsible for applying the design parameters explained in the methodology.After the above-mentioned, we performed a third selection, by removing sequences containing probes with the lowest entropy values and taking probes with a Tm range of 34.5 to 36.5°C and free energy values between -9.00 and -13.5Kcal/mol.

Virtual Hybridization
A database of tested target viral genomes used for the in silico experiments was created.The VH programs conducts a rigorous and reliable analysis to find and track all the sites in each viral genome where the probe sequences can hybridize taking into account the degree of complementarity between the probe and the recognized site in the target (allowing at least a mismatch difference) and the thermodynamic stability between them.The generated information constitutes an in silico genomic fingerprint listing details of the specific sites in each target DNA where hybridization occurred, the number and sequence of the probe that hybridized as well as the free energy value of the hybridizations and it also provides the sequence of the target site recognized by each probe.A free energy cutoff value of -9 kcal/mol for 9mer probes was used.2. It is clear that both viruses are very similar with only minor mutations, as expected for viruses from the same outbreak.However IPS genomic fingerprints are able to show seven differences between then, with five specific probes for A/Méxicoindre4487/2009 H1N1 virus and two for the A/California/04/2009.This is very important for molecular studies of influenza because IPS is highly sensitive as to spread viruses even those very closer; this will help in the management of influenza epidemiology, and not depend on a previous sequencing.

Conclusions:
Following the established parameters, the set of 1249 highly specific probes (IPS) allowed us to correct typing and subtyping of influenza viruses, including human and animal strains, as well as very similar strains.The IPS design based on the construction of probes from regions of the viral genome with maximum entropy allows a highly sensitive discrimination.
Through an in silico hybridization, the performance of the IPS microarray was simulated, allows us to know the possible behavior of the probes, and predicting genomic fingerprints of these viruses.

Figure 1 :
Figure 1: Taxonomic trees of 12 viral families including Paramixoviridae, Orthomixoviridae, Coronaviridae, Picornaviridae, Adenoviridae, Influenza A (H1N1, H1N2, H3N2), B and C, and two other Orthomixovirus, Thogotovirus and Isavirus is given (in red).(A) Fingerprinting Tree, (B) Alignment Tree.It is shown that all the Influenza A virus subtypes were clustered into a single group.

Figure 2 :
Figure 2: A) Genomic fingerprints of different influenza viruses and other viral families.Using as reference organism the virus Influenza A A /mallard duck/New York/170/1982(H1N2) (in red) and the Infectious salmon anemia virus(Isavirus), Thogotovirus,

16] and bacterial genomes [17].
5,000 viruses reported, considering almost all viral subtypes of Influenza A. Using Virtual Hybridization (VH) technology, in silico Genomic Fingerprints were generated, which in turn were compared to estimate a phylogeny based on the fingerprint pairwise distances.Other studies have employed the use of the VH technology to create genomic fingerprints for in silico classifying of microorganisms as Human Papillomaviruses [ Human respiratory syncytial virus(Paramixoviridae), Human rhinovirus B (Picornaviridae), SARS coronavirus (Coronaviridae), Human adenovirus D (Adenoviridae) in green to compare the fingerprints generated , Genomic fingerprints of different viral types of influenza virus.Using as reference organism the virus Influenza A A /mallard duck/New York/170/1982(H1N2) (in red) and Influenza B B/Mexico/84/2000 and Influenza C C/Ann Arbor/1/50 (in green) to compare fingerprints B) Genomic fingerprints of different viral types of influenza virus.Using as reference organism the virus A/New York/18/2006/H1N1 (in red) and A/

Table 1 :
Prediction is based in experimentally supported thermodynamic models, which suggest that the IPS microarray would be a valuable Influenza diagnosis tool.Table of distances pairwaise score generated in VH

Table 2 :
Viruses accession number, names and viral family