Genome-wide expression analysis of genetic networks in Neurospora crassa

The products of five structural genes and two regulatory genes of the qa gene cluster of Neurospora crassa control the metabolism of quinic acid (QA) as a carbon source. A detailed genetic network model of this metabolic process has been reported. This investigation is designed to expand the current model of the QA reaction network. The ensemble method of network identification was used to model RNA profiling data on the qa gene cluster. Through microarray and cluster analysis, genome-wide identification of RNA transcripts associated with quinic acid metabolism in N. crassa is described and suggests a connection to other metabolic circuits. More than 100 genes whose products include carbon metabolism, protein degradation and modification, amino acid metabolism and ribosome synthesis appear to be connected to quinic acid metabolism. The core of the qa gene cluster network is validated with respect to RNA profiling data obtained from microarrays.


Background:
Advances in systems biology and the relative ease of manipulating eukaryotic microorganisms such as fungi have made it possible to elucidate a complex trait such as how a cell metabolizes a carbon source. The genetics of the metabolism of the carbon compound quinic acid in Neurospora crassa is one of the early paradigms for eukaryotic gene regulation with the support of over four decades of genetics and biochemistry.
[1] A genetic network ( Figure 1) has been reported for the quinic acid gene cluster of N. crassa. [2] The network involves 54 reactions and 38 macromolecules and small molecules and is based on a kinetics framework [3] and experimental data generated over a span of more than three decades. [4] In the diagram, circles, boxes and arrows depict, reactions, molecular species, and the flow of reactants and products for the reaction proceeding "forward". Reactions without outgoing arrows (lollipops), indicate decay reactions. Reactants include the seven genes of the qa gene cluster (qa-x, qa-2, qa-3, qa-4, qa-y, qa-1S, qa-1F) [1] with superscripts "0" or "1" indicating the presence or absence, respectively, of a bound transcriptional activator, encoded by the qa-1F gene. These genes are transcribed into RNA molecules (superscript r), which in turn are translated into protein products (in capital letters). Four out of seven proteins have been demonstrated by experimental means to participate in a known biochemical pathway at the bottom of the diagram. There are at least two cellular states for quinic acid, extracellular (depicted as QA e ) or intracellular quinic acid (QA). One of the genes, qa-y, is thought to encode a carrier QA-Y, which is hypothesized to transport the quinic acid across the cell membrane. There is one hypothesized protein-protein interaction in the model between the repressor, QA-1S, and the transcriptional activator, QA-1F. Quinic acid in the cell is hypothesized to disrupt the bound complex of QA-1S/QA-1F to favor induction by QA-1F. [1] Since the aro and qa gene clusters are biochemically coupled, the network outlined here is very likely incomplete. For example, the qa-2 and aro-9 genes have the same biochemical function, both being dehydroquinases, which can functionally substitute for each other.
[5] Further, the aro cluster is known to encode products in the shikimate pathway, which in yeast is reported to be under cross-pathway or general control.
[6] This implies that the aro and possibly qa gene clusters should be hypothesized to be under GCN4 control.
[7] Here we employ the ensemble modeling method [2] to fit RNA profiling data generated from microarray experiments to identify the qa cluster. Through the use of average linkage analysis [8] we have examined the relationship of qa responsive genes to all of the genes in the genome. This should provide the basis for a refined quinic acid genetic network model.
After shifting the cultures for the appropriate time, cells were placed at -70 o C prior to RNA purification. Total RNA was isolated using the High Pure RNA isolation kit (Roche, Inc.).
The 12K oligonucleotide arrays were designed by Combimatrix, Inc. Arrays were constructed with Version 3 of the Neurospora crassa genome sequence Total RNA (750 ng) was subjected to one round of amplification using the MessageAmp aRNA Amplification Kit (Ambion, Inc.), which uses an "Eberwine type" amplification.
Biotin-11-UTP and CTP (Enzo Life Sciences, Inc.) were incorporated during an in vitro transcription reaction. Five micrograms of aRNA were fragmented, then a total of 10 pM each biotinylated spikein positive control oligonucleotide (λ-phage) were added with hybridization solution and, the mixture, hybridized according to the manufacturer's protocol rev 2.03 (http://www.combimatrix.com). Hybridization was carried out at 45 o C for 24 hours using a 25 % formamide based solution. Washing of chips was done according to manufacturer's protocol rev 2.03. Strepavidin Alexa Fluor® 647 conjugate (Invitrogen) was employed at a final concentration of 1.0 µg/ml to visualize hybridized target. Laser confocal scanning was performed on a GSI Lumonics ScanArray 5000 (Perkin Elmer, Inc.) using a laser power and a PMT gain setting adjusted less than 10% between arrays.
The median foreground (FG) counts were used on all 12K features. From each median foreground count on an oligonucleotide array a background subtraction was computed using the 5 th percentile of the negative control oligonucleotide features. Median foreground counts were normalized within arrays by multiplying each feature's median FG count on the particular chip x (median FG across all chips in the cycle) / (median FG count on the particular chip). A MIPS functional classification was assigned to each feature on a chip. Putative QA-1F-binding sites were identified with the GCG program pattern (Accelrys, Inc.) operating on the 1000 nt upstream of each identified gene [9] from the file neurospora_crassa_#10BD2C.fasta from the Broad Institute Web site. The offset used was 1 and overhang 0. Patterns searched are given in. [13] The ensemble method of circuit identification was used [2]. An ensemble method of circuit identification was used to fit genetic networks of the structure in Figure 1 to RNA profiling data on genes in the qa cluster. With a sweep being a visit once on average to each parameter in the model (i.e., initial species concentrations and rate constants in Figure 1), 40,000 sweeps were used to equilibrate a Markov Chain in the parameter space.
Transitions betweens points in the parameter space were governed by the Metropolis acceptance probability r = min (1,Q(θ')/Q(θ)), where θ' is the proposed update in the parameter space and θ, the current state of the Markov Chain.
Since the acceptance probability affects the efficiency of the Metropolis algorithm [14], a 50:50 mixture of global and local updating rules were used in such a way as to keep the acceptance probability between .33 and .67 by adjusting the step width of the Markov Chain. A local move involved updating only one element of θ; a global move involved updating all elements οf θ. For each proposed update, the LSODES method of solving ordinary differential equations was used to solve the kinetic equations specified by Figure  1. [15] The chain was initialized with a best fitting ensemble using the data in Battogtokh et al.

Results and Discussion:
The microarray experiment was designed to determine what genes of N. crassa respond significantly to a shift from sucrose to quinic acid as the sole carbon source. The fungus was grown in glucose (2%) for 48 hours (hrs) (columns 1 to 13, Figure 2), and then compared with cultures shifted from sucrose (2%) to quinic acid (columns 14 to 21, Figure 2). The transcriptional profile at 0 hrs, 0.5 hrs, 1 hrs, 1.5 hrs, 2 hrs, 4 hrs, 6 hrs, 8 hrs (columns 14 to 21) after shift from sucrose to quinic acid can be seen in Figure 2. The bright green (-3) and bright red (+3) is gene expression level on a decadic log scale. The microarray data were produced from Combimatrix chips probed with biotin labeled aRNA as the target. The microarray chip contained 11,000 N. crassa genes (45 genes used as positive controls) and 633 negative control oligonucleotide sequences including those derived from plant, bacterial, bacteriophage, and N. crassa rDNA sequences. The empirical false positive and false negative rates are 0.16 and 0.38, respectively. The data were normalized within the array and background subtracted using the average spot intensity of the negative controls on the chip, logging, and clustering with UPGMA using Euclidean distance between mRNA profiles of different genes (i.e., average linkage).  (Figure 2) indicate that protein degradation, ribosome biogenesis and amino acid metabolism are turned off during the shift from glucose to quinic acid. Protein degradation functions which are switched off in the presence of quinic acid include 20S proteasome, ubiquitin-protein ligase and acid proteinase. A number of putative mannosyl transferase enzymes which function in protein glycosylation are turned off and epoxide hydrolase, aldose epimerase and alcohol dehydrogenase enzymes which function in carbohydrate and energy metabolism are switched on after the change in carbon source. The gene which encodes a probable fructose 1, 6 bisphosphatase was turned on during the shift from glucose to quinic acid, suggesting that the gluconeogenesis pathway may become essential in the metabolic shift in carbon metabolism.
However, the gene encoding phosphoenolpyruvate carboxykinase which supplies the gluconeogenesis sequence with metabolites was not detectable under these conditions (either because it did not have a QA-1F binding site or because its RNA level did not change significantly under shift from sucrose to quinic acid). Induction of the genes encoding two distinct formate dehydrogenases were induced on quinic acid as the carbon source but the genes encoding isocitrate lyase and malate synthase, key enzymes of the glyoxylate cycle, were not detected.
Protein modification genes which encode products that function as kinases and phosphatases are turned on with quinic acid as a sole carbon source. It is worth mentioning that various kinase and phosphatase proteins have been clearly emphasized as key factors in metabolic processes such as regulatory cascades. However, the exact role that the quinic acid responsive genes play in metabolism remains to be determined.

Figure 2:
Clustering of the RNA profiles of 895 genes with a QA-1F binding site. Only genes with a QA-1F binding site responding to a shift from 1.5% sucrose to 0.3% quinic acid (QA) in their mRNA profiles from strain OR-74A are displayed. The first 13 time points were control cultures on 2% glucose from the bdA strain. The last 8 times points were OR74A cultures shifted from sucrose to quinic acid. Times of the last 8 time points after shift from sucrose to quinic acid were 0, 0.5 hrs, 1 hrs, 1.5 hrs, 2 hrs, 4 hrs, 6 hrs, and 8 hrs. After background subtraction, normalization within arrays relative to grand median of each chip, logging, the mRNA profiles were clustered with UPGMA using Euclidean distance between mRNA profiles of different genes (i.e., average linkage).
[8] The bright green is -3 and the bright red is +3 is expression level on a decadic log scale. Data below arose from 21 chips probed with a biotin labeled aRNA. The qa cluster genes and cluster centers of MIPS functional categories appear in the right margin of this microarray experiment. Genes in Figure 2 are represented at least 5 times on each chip. The ~895 genes were selected by t-test comparing the mean of the first 13 time points on glucose (2%) with the mean of the last 8 time points on quinic acid with those having t 19 > 2.093 (α=0.05) displayed Thus, the results suggest that numerous gene products are somehow related to quinic acid metabolism. Among the 895 genes with QA-1F binding sites are 105 genes that are clustered into the four functional categories indicated above and 412 genes (46 %) with unknown function. As expected, the quinic acid gene cluster was turned on by quinic acid. The remaining genes show a scattered pattern in the MIPS classification scheme. The development of a series of new computational biology tools allows the identification of genetic networks from RNA profiling data. [2] One approach known as the ensemble method for identifying regulatory circuits was applied to the qa gene cluster (see methodology) with results displayed in Figure 3. The dots are values measured by the microarray experiment and the space between the broken lines (+/-2 standard errors) in red are predictions by the fitted model ensemble. All seven of the qa gene mRNA levels are predicted well by fitting the microarray data to the model ensemble with topology of that in Figure 1, and in almost all cases the areas between the broken lines (in red) cover the experimental data. The final minimum chisquared obtained was 5212 with 486 data points and with the model having 38 species, 67 rate constants, and 3 scale factors (converting observations from Northerns/microarrays to model units).