Computational analysis of non-coding RNAs in Alzheimer's disease

Latest studies have shown that Long Noncoding RNAs corresponds to a crucial factor in neurodegenerative diseases and next-generation therapeutic targets. A wide range of advanced computational methods for the analysis of Noncoding RNAs mainly includes the prediction of RNA and miRNA structures. The problems that concern representations of specific biological structures such as secondary structures are either characterized as NP-complete or with high complexity. Numerous algorithms and techniques related to the enumeration of sequential terms of biological structures and mainly with exponential complexity have been constructed until now. While BACE1-AS, NATRad18, 17A, and hnRNP Q lnRNAs have been found to be associated with Alzheimer's disease, in this research study the significance of the most known β-turn-forming residues between these proteins is computationally identified and discussed, as a potentially crucial factor on the regulation of folding, aggregation and other intermolecular interactions.

Additionally, heterogeneous nuclear Ribonucleoprotein Q (hnRNPs) family assist in controlling the maturation of newly formed heterogeneous nuclear RNAs (hnRNAs/pre-mRNAs) into messenger RNAs (mRNAs), stabilize mRNA during their cellular transport and control their translation [13], affecting the dendritic development [14][15][16][17][18][19]. Latest studies also reveal the role of Postreplication repair protein RAD18 (NAT-Rad18) in AD by affecting the DNA repair system, leading to apoptosis and neurodegenration [7]. In contrast to protein folding programs, where the tertiary structure is predicted, the majority of the currently available RNA M-folding algorithms concentrate on the secondary structure of the RNA structure. Current RNA prediction algorithms have a polynomial runtime of O(n3) where n is the sequence length. Still, the mere knowledge of the secondary 352 ©Biomedical Informatics (2019) structure can be misleading, as two similar tertiary structures can have different secondary structures [20]. The problems that concern representations of certain biological structures such as secondary structures are either characterized as NP-complete or with high complexity. The incompleteness of the corresponding theories contributes to a kind of hybrid problem, where data mining, statistical analysis, biological interpretation, and computational techniques must interact in different phases, in order to produce a solution. Numerous algorithms and techniques related to the enumeration of sequential terms of biological structures and mainly with exponential complexity have been constructed through their bijection with alternative representations such as energy models, plane trees and Motzkin numbers, non-crossing set partitions, Motzkin paths and Dyck paths [21]. In contrast to protein folding programs, where the tertiary structure is predicted, the majority of the currently available RNA M-folding algorithms concentrate on the secondary structure of the RNA structure. The first reason for this difference is a pragmatic one. Current RNA prediction algorithms have a polynomial runtime of O(n3) where n is the sequence length. This is fast enough to allow genome-wide analysis on current off-the-shelf computers. The consideration of the tertiary structure, however, leads to a super polynomial-runtime impeding any large-scale application [22]. The second reason is related to the kinetic of RNA folding. Secondary structures form first, leading to a set of loops and helices, which once formed, interact to yield the tertiary structure. As a consequence, the determination of the tertiary structure depends strongly on the secondary structure [23]. Still, the mere knowledge of the secondary structure can be misleading, as two similar tertiary structures can have different secondary structures [20].

Materials and Methods:
Latest studies have already revealed the correlation between specific lncRNAs to AD pathologies and lesions in brain regions like the middle temporal gyrus, the prefrontal cortex, the striatum the cerebellum and the hippocampus and other CNS related disorders

Results:
Initially, multiple alignment files have been created using the ClustalOmega software, which has been imported in the ESPript 3.0 software for further displaying an analysis of the corresponding secondary structures (Figure 1) [29]. In the ESpript output, both the secondary and primary structures are displayed in separate rows, where dots represent gaps, α stands for alpha helix, β for beta strand, TT for strict β-turns, TTT for strict α-turns, alpha helices are shown as squiggles and β-strands as arrows in the multiple alignment representation (Figure 2 in supplementary material available with authors), in order to identify similarities and patterns between the proteins.  (65) there is a decrease in hydrophobicity and a simultaneous increase in the antigenicity of BACE1 (Figure 3). In the corresponding aligned positions of (66,67), there is a decrease in hydrophobicity and antigenicity. Furthermore, in positions (64, 65) b-strict turns to occur in both proteins, while the positions (61-64) of BACE1 have the same levels of hydrophobicity and antigenicity. It is noticed from the computational analysis that β-turns are appeared to be part of the spheroproteins surface and their residues are hydrophilic [30]. Therefore, it seems that in regions with β-turns hydrophobicity is reduced, affecting the folding of each protein and changing the direction of polypeptide's chain ( Table 2). In this study, the regions with this interesting property can be found on the common BACE1 and GABABR2 β-turns. In positions (64,65) of BACE1 and the corresponding GABABR2 aligned positions, there are identically aligned secondary structures of β-turns. In both turns, hydrophobicity shown to be reduced from a stable state, which confirms the statements concerning the hydrophobicity. In the same region BACE1 and GABABR2 switch from positive to negative hydrophobicity (0.06 to -0.22) and (0.14 to -0.28) respectively. Furthermore, in certain β-turns BACE1 consists of aspartic acid and GABABR2 consists of lysine and glutamic acid which are hydrophilic residues. Although, the β-turns consist of different residues in general, they still affect the protein folding precisely in the same way. Several research studies since the 70s, underlie the exceptional role of β-turns while they correspond approximately to the 30% of all the protein residues [31-33]. These type of secondary structures are strongly related to protein folding mechanisms depending mainly on their topology, functionality, and stability. According to their classification, β-turns can establish the initiation of folding and in some cases, the substantial destabilization of locally encoded protein features can lead to misfolding [30]. If (si,sj) ∈ S and (si,sl)) ∈ S then j = l iii) If ((si, sj) ∈ S and (sk, sl)) ∈ S and i< k then l < j or j < k 354 ©Biomedical Informatics (2019) Depending on the use of the RNA molecules, specific representations are more or less useful. The bracket notation is a text-based representation; the structure is reflected in a string of dots and brackets. Dots denote non-bonding bases and a pair of brackets indicates a base-pair. A more convenient representation, which expands in all directions in a plane and thus is closer to spatial representation, is the squiggle plot. It is the most prominent plot to easily describe the approximate spatial structure of RNA. Ordered base pairs are given as two bases connected through either a straight line or a circle indicating the so-called wobbling base-pair G-U. Considering RNA secondary structure in a more theoretical way, the representations as trees or as arc-annotated sequences are well-accepted. Schmitt et al computed the total number of RNA secondary structures of a given length with a fixed number of ordered base pairs, under the assumption that all ordered base pairs can occur, by establishing a one-to-one correspondence between secondary structures and trees [37]. In recent years, tree representations of RNA secondary structures occurred in the literature, and algorithmic applications on trees are performed successfully. For example, the full tree representation [38] associates ordered base pairs to internal nodes and unpaired bases to leaf. In a more detailed representation, each interior node is surrounded by right-most and left-most children which correspond to the 5' and 3' nucleotides of the ordered base pair, respectively. In a Shapiro-Zhang tree, the different loops and stacked regions are represented explicitly with special labels [39]. Arc annotated sequences focus on representing sequences as straight lines. Arcs indicate base pairings. A similar representation to the arc-annotated sequence is the drawing of this sequence on a circle. Arcs are plotted as curved lines inside this circle. The mountain plot is useful for large RNAs. Plateaus represent unpaired regions; the heights of these mountains are determined by the number of ordered base pairs in which the partial sequences are embedded. Specifically, the mountain plot representation maps the secondary structure into a 2-dimensional graph where the x-axis represents the position along the RNA sequence and the y-axis corresponds to the number of ordered base pairs that enclose nucleotide k. The dot plot representation maps the structure to a matrix where a dot at position (i, j) represents the ordered base pair (Si, Sj). The secondary structure of an RNA molecule is the collection of ordered base pairs that occur in its 3D structure. When the 5'-end of one nucleotide fits the 3'-end of another, a p-bond is formed, while the sequence of p-bonds defines the backbone of the molecules. if there is an h-bond connecting base 1 and n and for known integers n ≥ 2, l ≥ 0, there are S (l) (n-2) secondary structures of size n and rank l, establishing also a bijection between the set of all closed secondary structures Z (l) (n) and the set of all plane trees with exactly n leaves T (l) (n).
A constraint satisfaction formulation was also used for RNA prediction problem including genetic mapping

Conclusion:
While specific lncRNAs have been already correlated to certain AD lesions, a new computational analysis of the proteins BACE1, Rad18, GABABR2 and hnRNPQ have been presented in this study. Using the QIAGEN CLC Main Workbench, the ClustalOmega software and the ESPript 3.0 software, a detailed analysis of the corresponding secondary structures for the sequences 6EJ3, 4F12, 4UX8, 2Y43 has been executed. The results of our computational analysis identified common properties in aligned positions with high similarity score, identical secondary structure match, increased hydrophilicity, and negative antigenicity, revealing simultaneously strong evidence that the proteins under consideration, may have common functionality in those regions that regulate folding and aggregation and prevent binding of immune factors. These conclusions reveal the significance of the most known β-turn-forming residues, which participate in ligand binding, molecular recognition, protein-protein or protein-nucleic acid interactions and modulation of protein functions and intermolecular interactions, in proteins commonly linked to AD development or progression.