Identification and analysis of putative promoter motifs in Flavivirus genome.

The genus Flavivirus comprises medically significant pathogenic virus; causing several infections in humans worldwide. Flavivirus genomes are 10-11 kb approximately and encode both structural and non structural region. The non structural region plays fundamental role in the stability, regulation and cell cycle of virus. The complete genomes of 26 Flavivirus were used for identification of promoter motifs through in silico approaches. The promoter sequences were encoded in merely 16 viruses and 10 viruses could not encode it. All these in silico identified promoter motifs was confirmed and verified with the known experimental data. This analysis suggests that presence of promoter may play a crucial role in the pattern of gene expression, regulation networks, cell specificity and development. It may also be useful for designing efficient expression vector and target specific delivery system in the gene therapy.


Background:
Flavivirus consist of globally distributed group of arbovirus transmitted mainly by tick or mosquito vectors.The most significant Flavivirus are mosquito transmitted dengue virus which causes hemorrhagic fever (HF); encephalitis caused by Japanese encephalitis (JE) in tropical and subtropical region of the world.Around 50 -100 million case of dengue fever are annually reported in more than 80 countries where the mosquito vector Aedes aegypti is endemic and approximately 500,000 patients suffer from dengue hemorrhagic fever and dengue shock syndrome.JE virus is the leading cause of arboviral encephalitis in Asia, accounting for 30,000 to 50,000 cases annually.St. Louis encephalitis virus causes sporadic epidemic encephalitis in the Americas.West Nile virus (WNV) has caused more than 9,000 cases in North America since 1999 Flavivirus genome is a single-stranded, positive sense RNA of 10-11 kb containing a single ORF and is the only viral mRNA produced during the virus replication cycle.The replication takes place in the perinuclear region of cytoplasm in the infected cells.Three structural (capsid, premembrane and envelope) and seven nonstructural (NS1, NS2a, NS2b, NS3, NS4a, NS4b, and NS5) viral proteins are produced by proteolytic processing of the single polyprotein by viral and cellular proteases.The genome of Flavivirus open reading frame is flanked by the untranscribed region (UTR) in 5` and 3`.The UTR forms a complex RNA structure containing functional domain that are believed to play a role in virus translation, replication or assembly.These generates lots of scientific interest since, genetic modification within these region are known to attenuate Flavivirus without altering their antigenic specificity making them potential vaccine candidate for live attenuated vaccine [2].
A wide range of algorithms has been developed to assist the identification of promoters in genomic sequence of many gene prediction methods.The regulatory element utilizing the TATT-box has been reported in the genome of Epstein -Barr virus (EBV).The motif was present in the promoters of lytic cycle genes and resembles a crucial host genome motif (TATA-box).Since the binding specificity of eukaryotic proteins recognizing TATA-box (TBP) was determined and no specific preference for interaction with TATT motif was found

Analysis of Flavivirus genome
The size of Flavivirus genome was analyzed with the aid of Generunner, DNAstar and ExPaSy tools.The G+ C content of each genome was also predicted (Table 1 under supplementary material).

Transcription start site
The transcription factor sites are over represented in the promoter region.It is natural to seek a prediction program based on putative TF site density.The PROMOTERSCAN program was used to identify the putative promoter in the genome of Flavivirus.This program comprises three database such as TF database, promoter database and non promoter set constructed from protein and RNA gene sequences.The density of all putative TF site is calculated separately for promoter and non promoter sequences scoring functions supplemented with a TATA matrix score [7].

Results and discussion:
In the present study, complete genome sequences of 26 Flavivirus was analyzed.Within studied Flavivirus, the size of 24 genomes was approximately 10 kb.Merely, two genome sizes of Kamati River and Tick borne encephalitis virus was 11 kb.The smallest genome size is 10,053 bases of Tamana bat virus and the highest genome size (11,375 bases) is of Kamati river virus.The lower G+ C percentage of Tamana bat virus was 38.43 and the highest G+C content of Louping ill virus was 54.85.The genome size and G+C contents of Flavivirus were given (Table 1 under supplementary material).The establishment of persistent noncytopathic replication by replicon RNAs of a number of positive-strand RNA viruses usually leads to generation of adaptive mutations in nonstructural genes.Some of these adaptive mutations in hepatitis C virus increase the ability of RNA replication to resist the antiviral action of alpha/beta interferon (IFNalpha/beta) and Sindbis virus may also lead to more efficient IFN production [8].
Identification of important putative promoter in the complete genome of 26 medically significant Flavivirus was done.Total 22 different types of promoter were identified in the genomes and given (Table 2, see supplementary material).All these identified promoter motifs were confirmed with experimental obtained existing data.Sixteen viruses have the putative promoter sequences while ten Flavivirus do not have the promoter motifs in their genome.The identified putative promoter of Flavivirus was given (Table 3 under supplementary material).During the time of replication and multiplication all the host machinery was utilized by Flavivirus which did not have the promoter sequences.
The numbers of experimentally confirmed reports are available on the identification and characterization of promoter in the virus genome.The K1 gene of Kaposi's sarcoma-associated herpesvirus (KSHV) encodes a 46-kDa transmembrane glycoprotein that possesses transforming properties initiates signaling pathways in B cells and prevents apoptosis.K1 promoter demonstrated that purified Rta protein bound to the K1 at three locations independent of other DNA-binding factors [9].Kaposi sarcoma-associated herpesvirus vIRF is a viral transcription factor that inhibits interferon signaling and transforms NIH 3T3 cells but does not bind interferonstimulated response element (ISRE) DNA sequences [10].
A eukaryotic promoter-specific activator protein (activators) stimulates the transcription.An acidic activator can directly interact with the transcription factor TFIIB and increase the stable assembly into a preinitiation complex The regulatory element utilizing TATT box has been reported in the genome of Epstein -Barr virus (EBV).The motif is present in promoters of lytic cycle genes and resembled a crucial host genome motif (TATA-box).Since the binding specificity of eukaryotic proteins recognizing TATA-box (TBP) was determined and no specific preference for interaction with TATT motif was found [3].Consensus patterns of baculovirus sequences upstream from the translational initiation sites have been analyzed and a web tool Local Alignment Promoter Predictor (LAPP) for the prediction of baculovirus promoter sequences has also been developed.Potential consensus sequences, i.e., TCATTGT, TCTTGTA, CTCGTAA, TCCATTT and TCATT plus TCGT in approximately 30 bp spacing context, have been found in baculovirus promoter regions, in addition to well characterized late and early promoter elements G/T/ATAAG and TATAA, which is accompanied about 30-bp downstream by a transcriptional initiation sequence CAGT or CATT [4].
The adenovirus E1A gene and bICP0 encode proteins that are potent activators of viral gene expression.They do not specifically bind DNA and both proteins interact with chromatin-remodeling enzymes.A functional similarity of E1A was tested initially to see if it could stimulate BHV-1 productive infection.E1A consistently stimulates BHV-1 productive infection, but not as efficiently as bICP0.The ability of E1A to bind Rb family members plays a role in stimulating productive infection, suggesting that E2F family members activate productive infection.E2F-4, but not E2F-1, E2F-2 or E2F-5, activates productive infection with similar efficiency as E1A [5].
Accurate prediction of transcription factor binding sites is needed to unravel the function and regulation of genes discovered in genome sequencing projects.To evaluate current computer prediction tools, we have begun a systematic study of the sequence-specific DNA-binding of a transcription factor belonging to the CTF/NFI family [6].White Spot Syndrome Virus is a member of the virus family Nimaviridae and infecting shrimp and other crustacean species.The complete genome was analyzed to in silico identify the conserved promoter motifs.In the 5` upstream region contained the TATA box element is similar to the Drosophila RNA polymerase II core promoter sequences and utilization of the cellular transcription machinery for generating early transcripts [10].

Conclusion:
The in silico identification of promoter motifs in the genome of Flavivirus was done.These promoters play vital role in the regulation of gene expression.Delineation of the promoter is fundamental for understanding gene expression patterns, regulation networks, cell specificity and development.It is also important for designing efficient expression vector or to target specific delivery system in the gene therapy.These results might help in designing the live attenuated vaccine candidate through the site directed mutagenesis in the promoter region.In the large scale genomic era promoter prediction is crucial for gene discovery and annotation.
[11].Adult T-cell leukemia (ATL) is a complex and multifaceted disease associated with human T-cell leukemia virus type 1 (HTLV-I) infection.Viral oncoprotein is considered a major contributor to cell cycle deregulation in HTLV-I transformed cells by either directly disrupting cellular factors or altering their transcription profile.Tax transactivates these cellular promoters by interacting with transcription factors such as CREB/ATF, NF-kappaB, and SRF [12].The transcription factor TFIID consisting of TATA-binding protein (TBP) and TBP-associated factors (TAFs) plays a central role in both positive and negative regulation of transcription.The TAF Nterminal domain (TAND) of TAF1 has been shown to interact with TBP and to modulate the interaction of TBP with the TATA box, which is required for transcriptional initiation and activation of TATA-promoter operated genes [13].www.bioinformation.netHypothesis _________________________________________________________________________ ISSN 0973-2063 (online) 0973-8894 (print) Bioinformation 3(4): 162-167 (2008) Bioinformation, an open access forum © 2008 Biomedical Informatics Publishing Group 164 However, a limited number of data is available on the promoter motifs in the genus Flavivirus.The present study was carried out to identify and analyze the putative promoter region present in Flavivirus./www.ncbi.nlm.nih.gov/genomes/VIRUSES/viruses.html and the Universal Virus Database of the International Committee on Taxonomy of Viruses (ICTVdB) genome database cited at http://www.ncbi.nlm.nih.gov/ICTVdb/..ncbi.nlm.nih.gov/blast)against the complete training dataset which is extracted from Genbank database.All these identified promoters were verified and searched for homology in the database.

Table 3 :
The promoter sequences identified in this study.Identification of putative promoter region in Flavivirus genome.ND: Not detected promoter sequences in the Flavivirus.