Order-restricted inference for ordered gene expression (ORIOGEN) data under heteroscedastic variances.

This article extends the order restricted inference approach for time-course or dose-response gene expression microarray data, introduced by Peddada and colleagues (2003) for the case when gene expression is heteroscedastic over time or dose. The new methodology uses an iterative algorithm to estimate mean expression at various times/doses when mean expression is subject to pre-defined patterns or profiles, known as order-restrictions. Simulation studies reveal that the resulting bootstrap-based methodology for gene selection maintains the false positive rate at the nominal level while competing well with ORIOGEN in terms of power. The proposed methodology is illustrated using a breast cancer cell-line data analyzed by Peddada and colleagues (2003).


Background:
Increasingly, researchers are interested in understanding changes in gene expression when an animal/tissue/cell line is exposed to a chemical/treatment over time and/or dose. For instance, researchers in the U.S. National Toxicology Program are conducting numerous gene expression studies to evaluate toxicity of a variety of chemicals on various tissues/organs in rodents using dose-response studies. There are a variety of reasons for conducting a doseresponse/time-course gene expression study. Sometimes a researcher may be interested in understanding the changes in gene expression at a specific time/dose relative to the control. In other situations, a researcher may be interested in understanding the time-course pattern (or profile) of gene expression. Accordingly, statistical methodology for the analysis of time-course/dose-response gene expression data has been an area of active research in recent years. Although the methodology described in this paper is equally applicable to both time-course and dose-response studies, for simplicity of exposition we shall only discuss time-course studies. However, the same methodology may be applied to dose-response studies. Further, this work is motivated by experiments where independent samples are obtained at different time points, unlike repeated measures or longitudinal studies.
Depending upon the application, one may use a variety of available statistical methods for analysis. For example, if the objective is to identify genes that have significantly different expression values between two specific doses or a dose and control, then one may use statistical procedures such as SAM ( Type I error rates when the variances are homoscedastic. Throughout this paper the terms "Type I error" and "power" refer to the standard false positive and true positive rates for a given test. They are not adjusted for multiple testing. Recently, several authors (c. f. [3, 15]) have discussed methods for analyzing gene expression data that control for false discovery rates. An important development in this field is the work of Datta and Datta.
[3] They develop an empirical Bayes methodology for screening P-values so that the overall sensitivity of multiple testing is increased with a modest increase in false discovery rates.
Most procedures described above, are based on the assumption that for each gene, the expression values are homoscedastic (i.e., have equal variance) across times. In practice this assumption may not be true. Heteroscedasticity (i.e., unequal variances over time) may arise for a number of reasons. For instance, variability in gene expression could depend upon the mean expression value, or dose and/or duration of exposure. A potential consequence of heteroscedasticity is an increased false positive (and false discovery) rate and decreased power. Hence it is important to adjust for heteroscedasticity while analyzing gene expression data.
In section 2 we provide a step by step description of the new methodology for selecting statistically significant genes and clustering genes with similar time-course profiles. As in [12, 13, 14], all profiles are described by mathematical inequalities between the unknown parameters. We also compare the performance of the new procedure with ORIOGEN in terms of Type I error and power using a small simulation study. In section 3 we illustrate the proposed methodology using a data set described in Lobenhofer et al., [11] which was previously analyzed in. [13] Concluding remarks are provided in section 5 and in the Appendix we sketch the details of the proposed estimation and testing procedures.
Throughout this paper we use the terms "profiles", "patterns" and "order-restrictions" synonymously. Similarly, we use the terms "dose-response" and "timecourse" interchangeably.

Methodology:
For a given gene g, g = 1, 2, …, G, let gij y denote its expression in the

Algorithm 1:
Step 1 (Profiles specification) As in [13,14], the researcher pre-selects all the time-course profiles of interest in the study in terms of k sets of inequalities between the mean expressions. Thus, the desired parameter Two common profiles of interest are: Step 2 (Profile fitting) For each gene g we fit the observed expression values gij y against each profile p Θ in the alternative hypothesis using the estimation procedure described in Appendix A1.
For each fitted profile p Θ , we compute a goodness-of-fit statistic as described in Appendix A1 and select the profile with the largest goodness-of-fit statistic.
Step 3 (Bootstrap significance) We evaluate the statistical significance of the largest goodness-of-fit statistic obtained in Step 2 using the bootstrap methodology. Since the data are heteroscedastic, the bootstrap methodology used in [13] is not appropriate; instead we use the bootstrap procedure described in Appendix A2. To keep the false positive and false discovery rates small, we advise the user to test the significance of each gene at a very small level of significance. Further, since the level of significance is small, we run a large number of bootstraps.
Genes with a P-value less than the pre-selected level of significance are selected as the significant genes. All significant genes with the same selected profile are clustered together.
We compared the performance of the above methodology with ORIOGEN using a small simulation study. The goal  The funnel shaped heteroscedastic patterns considered above can be viewed as an "extreme" pattern in the sense that we expect this variance pattern to have greater impact on the false positive rate of test procedures based on homoscedastic variances than if the variance pattern has, for instance, an umbrella-shaped order restriction. We recognize that this is a small simulation study, but it conveys the drawbacks of procedures which do not account for heteroscedasticity and demonstrates that the modification proposed in this paper performs well. It is also important to note that the amount of variation in the data considered in patterns (6) and (7) are very extreme compared to the differences among the means and hence in this case neither of the methods is expected to have good power.
The results of our simulation study, based on 1000 bootstrap samples at a level of significance of 0.05, are reported in Table 1. Patterns (1), (2) and (3) provide the Type I errors of the two procedures, whereas patterns (4), (5), (6) and (7) provide the power of the procedure. As seen from   (1) mean expression is non-decreasing with time, (2) mean expression is non-increasing with time, (3,4,5,6) mean expression has an umbrella shape with peaks 4, 12, 24, and 36 hours and (7,8,9,10) mean expression has an inverted umbrella with troughs at 4, 12, 24, and 36 hours. Before implementing the new procedure, we applied Hartley's test for heteroscedasticity of variances. The Pvalues for the Hartley's test statistic was computed by bootstrapping the residuals since the null distribution of the Hartley's test is sensitive to normality assumption and gene expression data are not necessarily normally distributed. Using the usual level of significance of 0.05, we found that 367 genes out of 1900 were heteroscedastic. At 0.10 level of significance, this number jumps up to 610 genes. Thus there appears to be some amount of heteroscedasticity in the data which motivates us to apply the new methodology on this data.
According to ORIOGEN, which assumes homoscedasticity of variances, 197 out of 1900 genes were statistically significant at a level of significance α= 0.005. When we reanalyzed the data using the new methodology ORIOGEN-Hetero, we found 140 out of 1900 genes were significant at a level of significance α= 0.005. Of these 140, 115 were also selected by ORIOGEN. These common genes are listed in the attached spreadsheet. Thus 82 genes were selected only by ORIOGEN while 35 were selected only by ORIOGEN-Hetero. The discrepancy between these two procedures is possibly due to the amount of heteroscedasticity present in the data.

Conclusions and Discussion:
In this article we extended the order restricted inference procedure ORIOGEN of A simulation study reported in this paper reveals that the new methodology performs well in controlling the Type I errors and hence is expected to perform well in controlling the overall false discovery rates when the gene expression data are subject to unequal variances across time. Further, our modest simulation study suggests that the new method improves the power of the test as well when the variances are heteroscedastic. However, as seen in our simulation study, when the variances are homoscedastic, the new method may lose power relative to ORIOGEN. One way to get around this problem is to perform a test procedure such as Hartley's test for homoscedasticity of variances. Since Hartley's test is not robust against non-normality and gene expression data are not necessarily normally distributed, Pvalues for the Hartley's test may be determined by bootstrapping appropriate residuals. If the null hypothesis of homoscedasticity of variances is not rejected at some pre-specified level of significance of α, then one may implement ORIOGEN for such genes. For genes where the null hypothesis of homoscedasticity of variances is rejected by Hartley's test, then in such cases one may use the new method proposed in this paper. Such a pre-testing strategy might increase the power while protecting the Type I error and false discovery rates.
The resampling procedure used in ORIOGEN and ORIOGEN-Hetero does not allow for dependence in the samples across time as typically observed in a repeated measure study design. Estimation and testing for order restrictions under repeated measures design is a nontrivial generalization of the method described here. In an ongoing project we are generalizing ORIOGEN to allow for repeated measures data.

Acknowledgement:
The authors thank David Umbach, Grace Kissling, the reviewer and the editor for their careful reading of this manuscript and for numerous suggestions which improved the presentation of the manuscript substantially. The second author's research was supported by the Intramural Research Program of the NIH, National Institute of Environmental Health Sciences.

Appendix:
Throughout the Appendix we shall use the notations introduced in the main text. Step 1 (initial estimates): Let Step 2 ( For normally distributed data, with p Θ satisfying the simple order restriction, Shi and Jiang [17] discussed the convergence of the above algorithm. Although in this paper we do not discuss the convergence of the above algorithm for the general linear inequality restrictions, extensive simulation studies, using an umbrella order restriction, suggest that the above algorithm converges rapidly. On average it took less than 10 iterations in the simulations we performed. Step 4 (Computation of goodness-of-fit statistic): For