CFP: a web-server for constructing sequence-based protein conformational flexibility profiles.

UNLABELLED
Many proteins contain conformationally flexible segments that undergo significant changes in the backbone conformation or completely lack a well-defined conformation. Previously, we have developed the generalized local propensity (GLP), a quantitative sequence-based measure of the protein backbone flexibility. In this paper, we present the CFP (Conformational Flexibility Profile) web-server that constructs the GLP flexibility profile for a user-submitted sequence and uses this profile to identify segments with high backbone flexibility. The statistical significance of a flexible sequence segment is assessed using the discrete scan statistics based on the density of flexible residues observed in this segment.


AVAILABILITY
CFP is publicly available at http://cfp.rit.albany.edu.


Background:
Many proteins contain conformationally flexible segments. These segments undergo significant changes in backbone conformation, or are completely disordered (lack a well-defined structure) [1 -3]. A quantitative representation of the conformational flexibility of the protein backbone is important for many applications. Previously, we developed generalized local propensity (GLP), a quantitative sequence-based measure of backbone flexibility [4]. The GLP can be used to construct sequence-based protein flexibility profiles, and provides an objective numeric threshold for defining conformationally flexible segments [5]. For a given sequence position k, the GLP measures the width of the context-dependent distribution of backbone conformations accessible to this position, glp(k) (see references [4][5] for details). If glp(k) ≥ 1, it indicates that sequence position k is conformationally flexible.
Here, we present the CFP (Conformational Flexibility Profile) web server that constructs the GLP flexibility profile for a user-submitted sequence and uses this profile to identify segments with high conformational flexibility. Below is a brief outline of the steps implemented in CFP: The GLP flexibility profile is constructed for the query sequence and then smoothed using a sliding window of size W 1 . Consecutive positions which have GLP above a threshold T 1 are merged into seed flexible segments. Each seed flexible segment is extended by adding extension windows of size W 2 until its average GLP drops below an extension threshold T 2 . An extension window is added only if its average GLP is above a certain threshold T 3 . This extension procedure is similar to that used in the SEG program [6].
The extended flexible segments are reported in the final table. If the number of flexible residues observed in a given final flexible segment is unusually high (p-value < 0.05), then this segment is marked as statically significant. The significance of the number of flexible residues is estimated using the discrete scan statistic. This statistical procedure is the same as the one we previously implemented in the BIAS software to identify statistically significant clusters of userspecified amino acid types [7-8]. The web-server is publicly available at http://cfp.rit.albany.edu.

Input:
The only mandatory input is the query protein sequence. All other input fields have default values that can be modified by advanced users, if desired. These input fields are described below. Instructions for each field and general information about the methodology and the output format can be found by clicking a corresponding help hyperlink on the input page.

Smoothing window size:
The size of the sliding window (W 1 ) used to smooth the raw profile. High values of W 1 tend to reveal long flexible segments and mask the short ones. Lower values tend to reveal short segments.

GLP threshold for seed segments:
The threshold T 1 used to identify seed flexible segments. Contiguous sequence positions that have values of the smoothed GLP profile above this threshold are merged into a seed flexible segment.
Extension threshold -Each seed segment with high flexibility is extended on both sides until its average GLP drops below this threshold (T 2 ). Extension window threshold -The ends of a seed flexible segment are extended if the extension window has the average GLP above this threshold (T 3 ).

Extension window size:
The size of the extension window (W 2 ).
Hat-shaped local smoother: Positions in the center of the smoothing window contribute more to the smoothed GLP score than positions at the ends of the window.

Equal weights smoother:
The smoothed GLP score is the unweighted average computed over all positions in the window.
Minimum seed segment: Seed segments with length smaller than this threshold are not extended.
Maximum separation between merged segments: Flexible segments separated by this or smaller number of positions are merged into one.