Computational identification of composite regulatory sites in 16s-rRNA gene promoters of Mycobacterium species

The availability of completely sequenced genomes allow the use of computational techniques to investigate cis-acting sequences controlling transcription regulation associated with groups of functionally related genes. Theoretical analysis was performed to assign functions to regulatory systems. The identification of such sites is relevant for locating a promoter at the 5′ boundary of a gene. They also allow the prediction of specific gene-expression pattern and response to disturbances in a known signaling pathway. Here, we describe the identification of composite transcription factor (TF) binding sites over promoter regions in16s-rRNA gene for mycobacterium species strains ICC47, ICC67, ICC43 and CMVL700. It is established that the ribosomal gene comprises of sequences that are conserved during evolution and interspersed with divergent regions. Computational identification of known TF-binding sites was performed using TFSITESCAN tool and ooTFD database. The ICC67, ICC47, ICC43 and CMYL700 strains showed 12, 13, 9 and 15 known TF binding sites, respectively. Comparison between strains suggests 9 known TF predicted binding sites to be conserved among them. These data provide basis for the understanding of promoter regulation in 16s-rRNA.


Background:
The availability of a number of fully sequenced genomes allow the use of computational techniques to investigate cis-acting elements controlling transcription regulation associated with groups of functionally related genes.The gene expression pattern is integral to the structure of the transcription regulatory regions by specific combinations of TF binding sites.Several computational approaches have been reported to identify regulatory elements during the last decade.Specific TF binding site combinations were identified for muscle-specific promoters in liver-enriched and yeast genes [1-4].Recently, it has been shown that search for specific combinations of two TF site specific composite elements is an effective tool in predicting gene expression patterns for immune-cell specific genes.The antibiotic susceptibility of some of these species, namely, M. kansasii, M. marinum, M. simeae and M. asiaticum are low [10].The main disease caused by M. kansasii is benign pulmonary disease in elderly white males and M. marinum is a causative agent for swimming pool granuloma (skin ulcers) among swimmers.Therefore, it is important to understand the regulatory mechanisms in mycobacteria.Here, we describe the prevalence of TF sites in 16s rRNA using predictive tools.

Agarose gel electrophoresis
The presence and yield of specific PCR product ware analyzed at 3% agarose gel electrophoresis for 2 hrs in 75 volts.

DNA extraction from agarose gel
Elusion of DNA from agarose gel was done by 'Wizard' method.

Performing cyclo sequencing PCR
Extracted DNA was used in cyclic sequencing PCR for making single strand DNA.The parameters for cyclo sequencing PCR were done for 25 cycles with denaturation at 94 °C for 30 seconds, annealing at 55 °C for 10 seconds and extension at 60 °C for 4 minutes.

Purification
The cyclo sequencing PCR product was purified by adding 0.1 Volume of 3M Sodium acetate (pH-4.5)and 2.5 Volume of absolute alcohol.The solutions were mixed and tubes were left at room temperature for 15 minutes.The tubes were then centrifuged at 10000 g for 10 minutes.The supernatant was discarded and the pellet was washed with 70% ethanol (100μl).Subsequently, tubes were air-dried after wash.

Sequencing region of 16s rRNA promoter
The sample was transferred into fresh tube and closed with septa for loading onto sample tray.The sample runs through performance optimized polymer (POP) and electrophorized at 12.1 kV for 3 hours in 1x genetic analyzer buffer.The sequencing of amplicon was carried out using the ABI prism automated DNA sequencer (ABI prism genetic analyzer 310 USA).The sequences generated by the program were compared to their respective wild type sequences using the DNA Star software.

Computational analysis
Recognition of TF composite units TF composite unit consists of a binding site for a known TF arranged with various flanking motifs and potential targets for additional transcription factors.Such TF composite units could serve as targets for complexes of different transcription factors synergistically regulating gene transcription.The method is designed by discriminating single set of promoter sequences.TF database consist of known TF binding sites and weight matrices were developed for target sites.These parameters were used as training set in prediction and recognition.

Training dataset of 16s rRNA promoters
The training dataset of sequenced 16s rRNA promoters from mycobacterial species were arranged in FASTA format for further analysis.

Discussion:
In the non coding upstream sequences of Mycobacterium fortuitum (strain ICC-67), the 16s rRNA promoter contains a total of 12 known binding sites analogue with known transcription factors consisting of two unknown sites.It is seen that 8 binding sites with known TF is common in 16s rRNA genes of M. fortuitum.The results were summarized in Table 1b (see supplementary material) and the common sites were listed in Table 1c (see supplementary material).The 16s rRNA promoter in ICC-47 of Mycobacterium contains a total of 13 TF binding sites with12 known and 1 unknown TF.The promoter in ICC-43 contains a total of 9 binding sites along with known consecutive TF.One binding site was predicted with no information for transcription factor.The CMYL700 strain 16s rRNA gene of M. tuberculosis contains a total of 15 known binding sites with 13 sites of known transcription factors.A comparative study of TF among them suggests conversion between them.It should be noted these predicted data should be clearly confirmed by designing appropriate experiments.
Multiple TF elements have been shown to interact with the upstream region of 16s-rRNA promoter in Mycobacterium species.We found 8 TF known binding sites common in the dataset of mycobacterial 16s-rRNA promoter sequences used in this analysis.One known binding site was found www.bioinformation.netThese factors recognize each of the three CCAAT motifs present in the EIIL promoter at positions -72, -135 and -229.They also identify CCAAT elements in rat albumin and herpes virus thymidine kinase promoters.A mutation is known to reduce thymidine kinase promoter activity in vivo and in vitro.This abolishes binding of the factor termed CCAAT recognition factor (CRF) and it is distinct from previously identified CCAAT factors.In addition, the upstream factor II (USFII) shares binding sites at position -110 with EIIL promoter and c-fos enhancer adjacent to the serum regulating element.The recognition site for USFII is also found in c-fos promoter, adenovirus early region EIV and EIIa early promoters.A Sp1 recognition site has been identified at position -41, and the binding sites for Sp1, USFII and CRF are required for efficient EIIa-late promoter function.Finally, an additional factor recognizing the consensus element GGGGGGNT has been detected (see Table 1b and Table 1c in supplementary material).

Conclusion:
We described the regulatory elements in four promoters of 16s rRNA gene with different known TF in Mycobacterium species.Binding sites with known and unknown transcription factors reveal composite gene regulation over 16s rRNA gene in different species of Mycobacteria due to the presence of multiple binding sites and transcription factor data.A total of 9 known TF binding sites were predicted common in 4 promoters studied in 16s RNA gene.The details of each TF binding sites were summarized in Table 1b (supplementary material) with consensus patterns of occurrences.Higher number of occurrence strongly supports the presence of binding sites which might have a role in gene regulation.Details of each TF binding sites are summarized in Table 2 (see supplementary material).It should be stated that these data provide a skeleton to understand the basis of transcription regulation in Mycobacteria.Nonetheless these data require confirmation by appropriate experimental data.

Acknowledgement:
The present work was supported by a joint venture of the laboratory facility at National Institute for Leprosy (ICMR) and other Mycobacterial Diseases, Agra, U.P., India and CET, IFTM, Moradabad, U.P., India.An institutional research promotion grant to the Department of Biotechnology, College of Engineering and Technology, Moradabad, U.P., India is also acknowledged.The authors are grateful to Prof. R. M. Dubey (Managing Director, CET, IFTM, Moradabad, U.P., India) for providing the necessary facilities and encouragement.The authors are also thankful to all faculty members of the Department of Biotechnology, College of Engineering and Technology, Moradabad, U.P., India, for their generous help and suggestions during the course of experimental work and manuscript preparation.
Lehman and Neumann established the generic name mycobacterium [5].The first member of this genus to be identified was the Leprae bacillus by Hansen [6].Mycobacteria are gram-positive and are usually of the type acid-fast bacilli (AFB).They are non-motile and do not form capsule endospore or conidia.Some species do not grow in vitro.They are classified as slow or fast growing bacteria based on growth rate.This genus includes obligate parasites, opportunistic pathogens and saprophytes.These are invariably aerobic with a slightly curved or straight rod measuring about 1-10 micro-meter in size and are occasionally seen as branched filaments.Mycobacteria have arabinose and glucose as their principle cell wall sugars.It is established that the ribosomal gene rich region of both the prokaryotic and eukaryotic genomes comprise of sequences that are conserved during evolution interspersed among divergent regions.16S rRNA based phylogenetic analyses have contributed to the systematic identification and classification of mycobacteria.16S rRNA gene, either by direct sequencing [7] or by using probes [8], has now widely been used for rapid identification of mycobacteria.The clinical importance of several mycobacterial species has increased, especially since the human immunodeficiency virus (HIV) pandemic [9].
TFSITESCAN tool [11] is maintained by the Institute for Transcriptional Informatics [12].TFSITESCAN identifies potential transcription factor binding sites in a promoter sequence.The putative binding sites were derived from an object oriented transcription factors database named ooTFD described elsewhere [13].

Table 1a :
Strain number and organism names used in this study is given.