|
|
|
|
Beyond Bioinformatics |
|
|
|
|
|
|||||||||||||||||||||||||||
|
Title |
|
An efficient method for statistical significance calculation of transcription factor binding sites
|
|||||||||||||||||||||||||||
|
Authors |
Ziliang Qian1, 2, $, Lingyi Lu1, 2, $, Liu Qi3,*, Yixue Li1, 3, 4, *
|
||||||||||||||||||||||||||||
|
Affiliation |
1Bioinformatics Center, Key Laboratory of Molecular System Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, 320 Yueyang Road, Shanghai 200031, China; 2 Graduate School of the Chinese Academy of Sciences, 19 Yuquan Road, Beijing 100039, China; 3School of Life Science and Biotechnology, Shanghai Jiao Tong University; 4Shanghai Center for Bioinformatics Technology, 100 Qinzhou Road, 200235 Shanghai, China
|
||||||||||||||||||||||||||||
|
|
liuqi@sibs.ac.cn; yxli@sibs.ac.cn; * Corresponding author
|
||||||||||||||||||||||||||||
|
Article Type |
Prediction Model
|
||||||||||||||||||||||||||||
|
Date |
received December 13, 2007; accepted December 31, 2007; published online December 30, 2007
|
||||||||||||||||||||||||||||
| Abstract |
Various statistical models have been developed to describe the DNA binding preference of transcription factors, by which putative transcription factor binding sites (TFBS) can be identified according to scores assigned. Statistical significance of these scores, usually known as the p-value, play a critical role in identification. We developed an efficient algorithm to provide precise calculation of the statistical significance, remarkably enhancing the calculation efficiency by reducing the time complexity from an exponent scale to a linear scale, and successfully extended the application of this algorithm to a wide range of models, from the commonly used position weight matrix models to the complicated Bayesian Network models. Further, we calculated p-values of all transcription factor DNA binding sites recorded in the database, JASPAR, and based on these, we investigated some unseen properties of p-values as a whole, such as the p-value distribution of different models and the p-value variance according to changed scoring schemes. We hope that our algorithm and the result of computational experiments would offer an improved solution to the statistical significance of transcription factor binding sites. The software to implement our method can be downloaded from http://pcal.biosino.org/pCal.html.
|
||||||||||||||||||||||||||||
| Keywords |
transcription factor DNA binding sites; Bayesian network
|
||||||||||||||||||||||||||||
|
Citation |
Qian, et al., Bioinformation 2(5): 169-174 (2007)
|
||||||||||||||||||||||||||||
|
Edited by |
A. M. Khan, T. W. Tan & S. Ranganathan
|
||||||||||||||||||||||||||||
|
ISSN |
0973-2063
|
||||||||||||||||||||||||||||
|
Publisher |
Biomedical Informatics Publishing Group
|
||||||||||||||||||||||||||||
|
Copyright |
Publisher
|
||||||||||||||||||||||||||||
|
Copyright Transfer Agreement |
The authors of published articles in Bioinformation automatically transfer the copyright to the publisher upon formal acceptance. However, the authors reserve right to use the information contained in the article for non commercial purposes.
|
||||||||||||||||||||||||||||
|
License |
This is an open-access article, which permits unrestricted use, distribution, and reproduction in any medium, for non-commercial purposes, provided the original author and source are credited.
|
||||||||||||||||||||||||||||