In silico physicochemical characterization and topology analysis of Respiratory burst oxidase homolog (Rboh) proteins from Arabidopsis and rice

NADPH oxidase (NOX) is a key enzyme involved in the production of apoplastic superoxide (O2-), a type of reactive oxygen species (ROS). Plant Noxes are the homologs of mammalian NADPH oxidase's catalytic subunit and are documented as respiratory burst oxidase homologs (Rbohs). A number of studies have reported their diverse functions in combating various stresses and in plant growth and development. In the present study, a total of 19 Rboh proteins (10 from Arabidopsis thaliana and 9 from Oryza sativa Japonica) were analyzed. We employed in silico approaches to compute the physiochemical properties (molecular weight, isoelectric point, total number of negatively and positively charged residues, extinction coefficient, half-life, instability and aliphatic index, grand average of hydropathicity, amino acid percentage). We observed a lot of variability in these parameters among the Rbohs accounting for their functional diversification. Their topological analysis, subcellular localization and signal peptide detection are also performed. To the best of our knowledge, the present study report on in silico physiochemical characterization, topology analysis, subcellular localization and signal peptide detection of Rboh proteins within two model plants. The study elucidates the variations in the key properties among Rbohs proteins, which may be responsible for their functional multiplicity.


Background:
The accelerated generation of reactive oxygen species (ROS) such as superoxide (O2-), singlet oxygen ( 1 O2), and hydrogen peroxide (H2O2) has been implicated as one of the earliest hallmark of plants stress response. The major source of ROS production in plants is NADPH oxidase, which is localized to the plasma membrane and transfer electrons from cytosolic NADPH/NADH to apoplastic oxygen leading to ROS. It is the homolog of the mammalian NADPH oxidase catalytic subunit known as gp91phox [1]. In contrast to animals, plant NADPH oxidase consists of two main structural elements: Respiratory burst oxidase homologue (Rboh) and Rop (Rho-like protein; a Rac homologue of plants). OsRbohA was the first plant NADPH oxidase identified in Oryza sativa [2] and now plant NADPH oxidases encompass several Rbohs in dicots, monocots and lower plants [1]. Rboh proteins consist of two Ca 2+ -binding EF-hand motifs in the N-terminal region, six transmembrane helices and FAD and NADPH binding domains in the C-terminal. Recently available crystal structure of OsRbohB N-terminal region (138-313 amino acid residues) has highlighted the presence of two additional EF-hand-like motifs (EF-like 1 and EF-like 2) [3]. Rbohs perform ambidextrous functions in plant growth, development, and responses to abiotic and biotic stresses. The functioning of Rbohs requires interaction with various regulatory components which involve Ca 2+ , calcium-dependent protein kinases (CDPKs), Ca 2+ /CaM-dependent protein kinase (CCaMK), Rop, extracellular ATP (eATP), phospholipase Dα1 (PLD α1) and its lipid product phosphatidic acid (PA), mitogen activated protein kinase (MAPK), Nt14-3-3h/omega1 (a member of 14-3-3 protein family) and nitric oxide [1] . As evident from various studies, ROS production by Rbohs is associated with numerous stress, morphogenesis and development bound signalling pathways; although, how this ROS wave is deciphered downstream for a particular response is still to be elucidated. The study of various physiochemical parameters may provide the insight into their functional diversity.
Besides an array of experimental techniques available, various in silico approaches and online tools provide enormous opportunities for the characterization and analysis of gene and protein sequences [4,5]. These tools provide researchers a costeffective and faster output to understand genes and proteins, which will help in designing lab experiments. Recently, we have conducted phylogenetic analysis of Rbohs within the plant kingdom with orthologous identification, mutation and disorder prediction [6]. Further, an in silico study for the analysis of ciselements, CpG islands and tandem repeats on upstream regions from Rbohs of Arabidopsis thaliana and Oryza sativa japonica to get insights into their versatile functions was also carried out [7]. In addition to this, some non-homology based approaches such as physio-chemical parameters, subcellular localization, signal peptide prediction etc., may also provide useful insights into the functional diversity of proteins [5]. Several physicochemical properties of a protein such as isoelectric point, molecular weight, number of negatively and positively charge amino acid residues, instability index, aliphatic index and grand average of hydropathicity (GRAVY) can be computed. Various experimental studies have indicated the expression of Rbohs in plants including few from Arabidopsis thaliana and Oryza sativa. 10 Rbohs from A. thaliana and 9 from O. sativa have been reported, but the information regarding their biological role to various abiotic (cold, drought, osmotic, salt, heat and light) and biotic (pathogens and herbivores) stresses is still incomplete [1]. To the best of our knowledge, no study has been documented yet on the physiochemical characterization and topology analysis of Rbohs.

Methodology:
Sequence retrieval: Accession numbers of protein sequences for Arabidopsis and rice Rbohs were retrieved from a recent study of our lab [1]. A total of 19 sequences (10 from Arabidopsis and 9 for rice) were downloaded from UniProt (http://www.uniprot.org/) in FASTA format and used for further analysis.

Physio-chemical properties:
The physicochemical properties were computed for 19

Statistical analysis:
T-tests were performed using SigmaStat 3.5 software.

Results:
In the present study, various physio-chemical properties, subcellular localization, signal peptide detection and topological analysis of 19 Rboh protein sequences, 10 from Arabidopsis and 9 from rice were analyzed. The protein name and accession number are shown in Table 1.

Physio-chemical properties:
Physio-chemical properties were calculated for 19 Rboh proteins ( Table 2). The properties include length, molecular weight, isoelectric point (pI), total number of negatively and positively charged residues, extinction coefficient, instability index (II), aliphatic index (AI) and grand average of hydropathicity (GRAVY). Among Arabidopsis Rbohs, AtRbohB was the shortest Rboh with 843 amino acids while AtRbohE is the longest one with 952 amino acids. The computed pI was more than 7 for all 10 AtRbohs, where the lowest (8.71) and highest (9.48) values were obtained for AtRbohI and AtRbohJ, respectively. The number of positively charged amino acids was more than negatively charged among all AtRbohs. Extinction coefficients (ECs) were determined at 280 nm with the assumption that all pairs of Cys residues form cystines. They were falling in the range of 143295 to 164600 M -1 cm -1 where the lowest value corresponds to two Rbohs; AtRbohC and AtRbohG, while highest value corresponds to AtRbohF. The instability index (II) for AtRbohs range from 38.32 (AtRbohD) to 48.99 (AtRbohI). In addition to II, aliphatic index (AI) for AtRbohs were also computed, whichwas found to vary from 83.88 (AtRbohH) to 89.37 (AtRbohB). The GRAVY score was observed in range from -0.16 (AtRbohB) to -0.241 (AtRbohD). Further, the amino acid percentage composition of 20 amino acids among 10 AtRbohs was determined (Table 3) and their distribution for different types of amino acids was determined (Figure 1).
In case of rice Rbohs, OsRbohA was the shortest protein with 743 amino acids while OsRbohF was the longest one with 1033 amino acids ( Table 2). The computed pI was >7 for all 9 OsRbohs where the lowest (8.98) and highest (9.84) values were obtained for OsRbohA and OsRbohF, respectively. The number of positively charged amino acids was more in number than negatively charged among all OsRbohs. Extinction coefficients were obtained in the range of 117855 to 165170 M -1 cm -1 where the lowest value corresponds to OsRbohE, while highest value corresponds to OsRbohG. The instability index (II) for OsRbohs range from 39.76 to 49.34. The highest II was observed for OsRbohF (52.79), which was followed by OsRbohC (50.23) and OsRbohG (49.34). However, the lowest II value was obtained for OsRbohB. The AI for OsRbohs was found to vary from 77.51 (OsRbohF) to 93.2 (OsRbohA). The GRAVY score lies in the range from -0.087 (OsRbohA) to -0.286 (OsRbohF). Further, the amino acid percentage composition among 9 OsRbohs (Table 3) and their distribution for different types of amino acids were determined (Figure 2).
To find any significant differences among amino acid composition between two species, t-tests were applied. They revealed significant differences between AtRbohs and OsRbohs in the percentage of non-polar (alanine: A, glycine: G, isoleucine: I and proline: P), polar (asparagine: N) and positively charged (arginine: R, lysine: K) amino acids (Figure 3a). The magnitude as well as direction of the significant differences in the amino acid percentage composition for the two species is represented by their t-test values in Figure 3b. The height of the bar indicates the relative difference in the sample means and its direction (up or down) represents which plant species contain the higher percentage of that amino acid. Positive t-test values indicate a higher percentage of that amino acid in AtRbohs whereas negative values correspond to a higher percentage in OsRbohs.
The estimated half-life for 18 Rbohs except OsRbohA was found to be 30 hours (mammalian reticulocytes, in vitro), >20 hours (yeast, in vivo) and >10 hours (Escherichia coli, in vivo). For OsRbohA, it was 4.4 hours (mammalian reticulocytes, in vitro), >20 hours (yeast, in vivo) and >10 hours (E. coli, in vivo). Subcellular localization prediction indicated all 19 Rbohs as plasma membrane associated and absence of any signal peptide.

Topological analysis:
Individual Arabidopsis and rice Rboh proteins were predicted to contain 4 to 7 transmembrane domains (TMDs) based on TMHMM, Phobius, HMMTOP and WHAT programs. However, more accurate results could be obtained when aligned homologous sequences are used. Hence, multiple sequence alignments were done for 10 AtRboh and 9 OsRboh proteins (S1 File) to generate average hydropathy, amphipathicity and similarity plots (Figure 4 a & b). Hydropathy refers to the extent of hydrophobicity or hydrophilicity of amino acids while amphipathicity describes the retention of both hydrophobic and hydrophilic nature in a protein. Six conserved peaks of hydrophobicity correlate with six peaks of similarity, which correspond to six TMDs among AtRbohs and OsRbohs. All these peaks displayed moderate level of amphipathicity. The peaks of amphipathicity in loops between TMDs exceeded the amphipathicities of the six TMDs within 19 Rbohs. Among OsRbohs, a large insertion in TMD-III of OsRbohF showed low similarity. To show that TMD-III is well-conserved, average hydropathy, amphipathicity and similarity plot was constructed by removing OsRbohF (Figure 4c). In addition, one peak of hydrophobicity, similarity and amphipathicity was observed within AtRbohs and OsRbohs. The results also showed that Nterminal had least similarity among AtRbohs and OsRbohs. Also, it appeared more hydrophilic as compared to C-terminal. There was no clear peak of amphipathicity corresponding to the Nterminal among AtRbohs and OsRbohs.      M. wt., pI , (-) R, (+) R, (ε, 280), II, AI and GRAVY denotes molecular weight, isoelectric point, total number of negatively charged residues, total number of positively charged residues, extinction coefficient at 280 nm, instability index, aliphatic index and grand average of hydropathicity.

Discussion
In the present work, we were focussed on in silico physiochemical characterization of 19 Rboh proteins (10 from A. thaliana and 9 from O. sativa Japonica), their topological analysis, subcellular localization and signal peptide detection. The most fundamental characteristics of protein sequences are length and size (molecular weight). In our study, more variation in protein length and molecular weights was observed in rice Rbohs as compared to Arabidopsis. The isoelectric point (pI) and charge are also important parameters for solubility, subcellular localization and interaction. The pI denotes the pH value at which the protein carries no charges or the negative and positive charges are equal. It was observed that the calculated pI was > 7 for 19 Rbohs which indicates their basic nature. The basic nature and large size of these transmembrane proteins is consistent with the previous report inferring membrane proteins as heavier and more basic than non-membrane proteins in bacteria, archaea and eukaryotes [20,21]. These observations are also in agreement with the view that membrane bilayer is negatively charged and basic amino acids from these proteins have proper electrostatic interactions, which promote their stability in the membrane. In addition, transmembrane proteins are evolving rapidly to adjust with the external environment so that they can interact with an extensive range of partners. Also, for the purification of a protein by isoelectric focusing methods, the pI value will be useful for developing buffer system. In addition to pI, the instability index (II) provides an estimation of the stability of the protein in vitro and in vivo. A protein whose instability index is <40 indicates stable and the value >40 infers unstable protein [22] . The lowest instability index observed for AtRbohD indicated its stability and hence its ability to play multiple roles in plant development, biotic and abiotic stress conditions [1]. Similarly, other wellstudied Rbohs found to possess instability index below 40 were AtRbohB and OsRbohB with 38.55 and 39.76, respectively. AtRbohB is involved in seed germination and after-ripening [23] while OsRbohB is the only plant Rboh which has been crystallized [3] and also involved in immune response [24].
Another measure for stability of proteins is the aliphatic index (AI) and increase in its value is reported to enhance the thermo stability of globular proteins [25]. AI refers to the relative volume occupied by aliphatic side chain of the following amino acids: alanine (A), isoleucine (I), leucine (L) and valine (V). The lowest AI of OsRbohF is indicative of its low thermal stability and hence of more flexible structure when compared to other Rbohs. The high AI of OsRbohA, AtRbohB, OsRbohC, AtRbohG, AtRbohJ and AtRbohE inferred that Rbohs might be stable under a wide range of temperature conditions. Further analysis of amino acid percentage composition revealed leucine to be the most abundant amino acid among AtRbohs and OsRbohs. This observation is consistent with an earlier report documenting the high occurrence of leucine in membrane proteins [21]. Also, our pattern of amino acid frequencies correlate with that of earlier report on membrane proteins [21]. In addition, extinction coefficient of Rbohs was also computed at 280 nm. The calculated ECs of Rbohs indicated the presence of high concentration of tyrosine (Y) and tryptophan (W), and not of cysteine (C) because it was observed in very low amount in all Rbohs. This indicated that UV spectral methods couldn't be employed to analyze Rbohs. However, the obtained EC values will aid in the study of protein-protein and protein-ligand interactions [26]. Similar to stability and protein concentration, it is also critical to evaluate the hydrophobic or hydrophilic character and topology of the protein. For this purpose, GRAVY score and topology analysis were done. GRAVY score denotes the sum of hydropathy values of all amino acids in the protein, divided by the number of residues in the protein. It lies in the range from -2 to +2 where positive value represents hydrophobic and negative indicates hydrophilic protein [27]. It is also an indicator of whether a protein would be observed on 2-D gels, as proteins having GRAVY scores >0.4 does not lie in solubility range and hence are difficult to detect [28]. In case of Rbohs, GRAVY score exhibited a very narrow range (-0.087 to -0.286) with less negative value indicating a low hydrophobic nature and hence good solubility. This may be due to the presence of hydrophilic Nterminal and six hydrophobic TMDs, which is further in agreement with our topological analysis. These lines of evidence are also consistent with earlier studies reporting hydrophilic proteins with TMDs [29, 30] as well as six TMDs in Rbohs [31,32]. In addition to 6 TMDs, topological analysis also revealed a separate hydrophobic peak, which indicate the conserved glycine-rich motif (GXGXG) from NADPH binding domain of Rbohs. The glycine-rich motif has been reported in substrate binding, where substrate could be ATP and S-adenosyl-L-methionine (SAM) in histidine kinases and SAM-dependent methyltransferases, respectively [33,34]. Other kind of glycinerich motif (GXXXG) is documented in transmembrane α-helices and help in stabilizing the oligomerization of membrane proteins [35].

Conclusion:
The current study sheds light on the variations in the vital properties such as molecular weight, isoelectric point, and total number of negatively and positively charged residues, extinction coefficient, instability index, aliphatic index and grand average of hydropathicity within Rbohs proteins, which may be responsible for their functional multiplicity. Insights from the evaluation of their hydrophobic or hydrophilic character and topology are also reported.