From EST to structure models for functional inference of APP, BACE1, PSEN1, PSEN2 genes

Successive oxidative stress and biochemical changes results in neuronal death and neuritic plaques growth in Alzheimer's disease (AD). Therefore, it is interest to analyze amyloid-βeta precursor protein (APP), beta-secretase 1 (BACE1), presenilin (PSEN1 and PSEN2) genes from brain tissues to gain insights. Development of potential inhibitors for these targets is of significance. EST sequences of 2898 (APP), 539 (BACE1), 786 (PSEN1) and 314 (PSEN2) genes were analyzed in this study. A contig sequences with APP (contigs 1-4), BACE1 (contigs 5-7), PSEN1 (contigs 8, 9, 10, 11), PSEN2 (contigs 13, 14) except PSEN1 (contigs 10) and PSEN2 (contigs 13) genes were identified. APP (contig 3 without translational error) was further analyzed using molecular modeling and docking to show its binding with curcumin (principal curcuminoid of turmeric) having -7.3 kcal/mol interaction energy for further consideration as a potential inhibitor.

©Biomedical Informatics (2019) PSEN2 genes and new discovery for the development of novel therapeutic approaches for the treatment of AD.

Methodology: Retrieval of ESTs sequence and assembly:
In silico analysis of AD human genes APP, BACE1, PSEN1 and PSEN2 taken from UniGene database and those genes originating from brain tissues were taken. The 5' ESTs were considered, as the ESTs generated from the 3' end are most error prone as of the low base-call quality at the start of sequence reads. The 5' EST sequences were extracted using contig assembly program by CAP3 server [8]. The default parameters were used and each gene sequences were submitted to DNA sequence assembly program (CAP3) server in FASTA formatted text file and result was displayed in different output files e.g. contigs, single sequences, Assembly details and sequence file. We have selected contig sequence data set as it is useful functionality ascertained.

Database similarity search:
The contig sequences were obtained from clustering and similarity search using tools like nucleotide BLAST (BLASTN) and BLASTX (search protein). The contig sequence is aligned to the genome sequence of the organism using BLAT (BLAST like alignment tool) [9] to assist genome mapping and gene discovery. Each genes contig sequence was generated by BLAT analysis with parameters reading (genome: human, assembly: Dec. 2013 (GRCh38/hg38), query type: translated DNA, sort output: Score, output type: hyperlink).

Conceptual translation of ESTs and functional annotation:
ESTScan is a program that can identify the coding regions in DNA sequences and this was translated into amino acid sequences at either N-or C-terminus. Each contig sequence was generated by ESTScan2 tool [10]. Finally, the amino acid sequences were selected using multiple sequence alignment by CLC Genomics Workbench and further functional annotations were carried out. Our translated protein sequences for each sequence were generated by InterProScan 5.0 [11].

Molecular modelling of hypothetical protein:
Structural annotation of APP hypothetical amino acid sequence was used for build a 3D structure by Modeller v9.13 software [12]. The hypothetical protein sequence was aligned in BLASTP against the Protein Data Bank (PDB) database to select their appropriate templates. The template was selected for hypothetical protein query sequence aligning
These templates were used to build a 3D structure for homology modelling. Modelled structure was energy minimized using Swiss-PDB viewer program (Gromos96 force field). Theoretically predicted structure was visualized using PyMol visualization grid generated APP modelled protein. The inhibitory compounds used for docking was screened using Virtual screening. Glide score was selected as the scoring function to rank the poses of each inhibitory compound. Validation of the docking is useful technique to identify best docked complex among number of docked complex.   Table 1. ESTs of four gene entries originating from brain tissue were used for further analysis. It shows the list of mRNA and ESTs entries.

EST clustering and assembly:
Each gene sequence of ESTs from brain tissue was retrieved. The 5′ ESTs were analyzed, as the ESTs created from the 3′ end are most error prone because of the low base-call quality at the start of sequence reads. The subjected ESTs along with their resulting contigs found a total of 988 ESTs from four reported gene entries as listed in Table 5 (Supplementary Material at the bottom of the article). The tissue-based ESTs from four reported genes were subjected to cluster analysis by CAP3 Server. 14 contigs of four genes were found and further analysis was under taken.

Database similarity searches:
The database similarity search by querying these contigs in BLAT against human genome revealed that alzheimer's contig of APP shows good matches with chromosomes 21. The BACE1, PSEN1 and PSEN2 contigs were showing good matches with chromosomes 11, 14 and 1 respectively and are shown in Table 2. The conceptual translation of 14 contigs sequences in ESTScan2 provides 12 protein sequences from APP, BACE1, PSEN1 and PSEN2, as presented in this analysis and protein sequences were not available for the rest of two contig nucleotide sequences contig 10 and contig 13. Multiple sequence alignment was done for these 12 protein sequences obtained by ESTScan2 tool. The entire alignment shows contig 3 sequence of APP protein alone with no error at translate level and rest of the 11 protein sequences were left due to some erroneous readings (X, which does not code for somewhat amino acids or refers to a stop codon) in their sequence as shown in Figure  1, obtained by CLC Genomics Workbench 7.6. The APP protein sequence of contig 3 is 751 amino acids with a molecular weight of 84818.77 Daltons and this sequence was named as hypothetical protein for further annotation.

Conceptual translation of ESTs
The APP protein sequence was reported from 5′ ESTs of brain tissues and it belongs to the APP amyloid and beta-APP families of proteins with a distinct N-terminal and C-terminal. The major part of the amyloid plaques found in the brains of AD and peptide regions of 36-43 amino acids are fatefully involved in amyloid precursor protein. Aβ molecules can aggregate to form oligomers and the resulting amyloid plaques are toxic to nerve cells [18]. Nterminal region of the APP is a member of the heparin-binding class of GFLDs (Growth Factor-Like Domain) and may itself have growth factor function, neuronal development.  Donepezil -----Note: Hyphen sign (-), denotes no interaction between protein and ligand. Highlighted compound curcumin shows best glide score and more number of hydrogen bonds, best interaction with mutated residues among other compounds.

Molecular modelling of hypothetical protein:
The 3D structure of hypothetical protein of human APP was predicted using MODELLER v9.13. This program was generated ten different 3D modeled structures and validating these structures was considered based on the scoring percentage of the favored regions. Finally, we selected the best modeled structure for hypothetical protein (model 3) as depicted in Figure 2A. Validation of Ramachandran plot showed >96% of the residues in most favored and additional allowed regions and the structure of our modeled protein was found to be stable. Verify3D methods evaluate protein structure using 3D profiles and this program analyzed the compatibility of an atomic model (3D) with their possess amino acid sequence (1D). Each residue is allocated a structural class based on the scores ranges from -1 to +1. In our results verify3D score value of modeled APP protein is -1.0 to 0.7 ( Figure 2B). Validation results showed stero chemical properties and geometrical arrangements of the atoms of the protein was stable. The root-mean-square deviation value of modeled APP protein 3D structure was higher (0.439Å) than the existing crystal structure PDB IDs: 3KTM (2.70Å) and 3NYL (2.80Å) with an energy value of -30227.773KJ/mol. . C.asiatica plant essential oil extract from leaves and GC-MS analysis compounds like Thujopsene, α-Thujene, Eucalyptol, 3-Nonen-2-one, β-Linalool, L-Camphor, trans-Borneol, α-Terpeneol, Cis-Geraniol, Isobornyl acetate, 7-Tetradecene, β-Elemene, β-Gurjunene, γ-Elemene, Isocaryophyllene, Aromadendrene, β-Farnesene, β-Acoradiene, β-Selinene, α-Selinene, α-Chamigrene, α-Panasinsen, -(-)Spathulenol, Viridiflorol, Valeranone, Isoaromadendrene epoxide, Aristolene epoxide, 1-Naphthalenol. This plant has ability to prevent cognitive deficits treatment for AD. C.paniculatus plant contains essential oil extract from seeds and GC-MS analysis compounds like Palmitic acid, Erucic acid, γ-Muurolene, Cubenol. The seed oil is studied as best nervine tonic and used in treatment of various neurological disorders [29]. We validated the efficacy of synthetic and medicinal plants based compounds with modeled APP protein using molecular docking approach to identify the best inhibitor for AD.
APP is a transmembrane protein without known function that is constitutively cleaved into peptides during cell metabolism. The amyloidogenic 40 or 42 amino acid Aβ peptide is released after cleavage by β-secretase and γ-secretase. Familial alzheimer's disease (FAD) mutations have been identified in APP, PSEN1 and PSEN2 genes, which are essential for the generation of Aβ peptides [30]. Reported APP mutation sequences include A673V [31], V717I [32]. Figure 3 shows the interaction of modeled APP protein with curcumin having least glide score value of -7.3Kcal/mol and more ©Biomedical Informatics (2019) number of hydrogen bonds (ARG566, VAL673) were formed than other compounds. From the results of docking study, out of 11 medicinal plant compounds only six medicinal plants such as P. ginseng (Ginsenoside Rb1), C. longa Linn (Curcumin), C. asiatica (Aristolene epoxide, Valeranone), B. monnieri (Phytol acetate), B. monnieri (Dimethoxane), C. paniculatus (Erucic acid) and synthetic (Rivastigmine, Tacrine, Galantamine) compounds showed proper interaction but mutated residues docked with ginsenoside rb1 and curcumin compounds ( Table 4). Tang and Taghibiglou 2017 [33] has reported curcumin compound to be more effective than current treatment of AD. Alcigir et al. [34] found that positive results in new-born rodent pups, curcumin compound as a natural therapy for permanent treatment based on neuronal impairment. Abdolahi et al. [35] has considered curcumin compound as a novel promising therapy in migraine prevention. From the molecular interaction study, we conclude that, natural compound curcumin shows better interaction than synthetic, other natural screened compounds and AD approved drugs. Hence we suggested as an alternative lead compound of curcumin in alzheimer's disease research.

Conclusion:
EST analysis of the four genes associated with AD produced 14 contig sequences. APP contig 3, the only contig with no error of translation was annotated using functional and structural data. APP was further analyzed using molecular modeling and docking with natural compound of curcumin, it shows the best glide score of -7.3kcal/mol into mutated residues unlike the synthetic and other natural compounds. Hence to avoid the side effects of synthetic drugs and natural compound, curcumin is suggested for the treatment of AD.