Re-annotation for hypothetical protein CA803_03125 of Methicillin-Resistant Staphylococcus aureus strain SO-1977 isolated from Sudan

This study aims to describe the global detection and functional inference of hypothetical protein CA803_03125 from Staphylococcus aureus SO-1977. Computational methods were utilized to study this protein based on sequence similarity and presence of known protein domains. The BLASTp result revealed a significant similarity between the hypothetical protein (CA803_03125) and ADP-ribose hydrolase protein from four S. aureus strains (MW2, MRSA252, COL, and N315). Evolutionary tree diagram revealed a close relationship between the hypothetical protein and proteins of MW2 and COL strains. The physicochemical characterization revealed that all proteins were found to be stable, soluble, hydrophilic and acidic in their nature. The Macro domain was found to exist within all proteins. Moreover, the proteins were of pronounced similarity in terms of primary, secondary and tertiary organization. The protein CA803_03125 (SO-1977) is already known and well characterized as ADP-ribose hydrolase; therefore, we would recommend that its NCBI data has to be updated to be submitted under the name of ADP-ribose hydrolase.


Background:
Methicillin-resistant Staphylococcus aureus (MRSA) is any strain of a bacterium, Staphylococcus aureus that has developed resistance to most of the available antibiotics. In the last decades, epidemiological studies showed sound increasing of endemic and epidemic spread while its control has become an important concern worldwide [1]. In Sudan, studies have also shown high MRSA incidence rate in hospitals [2]. During recent years, hundreds of bacterial genomes are available, while their annotation is of interest [3]. However, many of these protein functions are still unknown. For this reason, there is an increasing demand for the annotation of the functions of uncharacterized proteins, called "hypothetical proteins" [4]. Hypothetical Protein (HP) is a protein that predicted to be expressed from an open reading frame, but for which there is no experimental evidence of translation [5]. About half of the proteins in most genomes are candidates for HPs. This group is of utmost importance to complete genomic and proteomic information. Detection of new HPs not only offers a presentation of new structures but also new functions [6]. Many protein domains have unknown functions; however, these domains participate in the metabolic pathways of organisms and can cause adverse effects. Several approaches have been developed by scientists with the aid of various computational tools to predict protein function. This has been achieved from information derived from sequence similarity, ©Biomedical Informatics (2019) 161 phylogenetic analysis, conserved domains, motifs and 3D structure [7]. In this study, an extensive insilico analysis was carried out to explain the functional properties of the hypothetical protein CA803_03125 (accession number: OXL90457), of Methicillin-Resistant S. aureus strain SO-1977 using available protein structural and functional analysis tools.

Physicochemical parameters
Computations of various physical and chemical parameters for all proteins were predicted using the ProtParam server. ProtParam was used to determine the following parameters; molecular weight (M. Wt), isoelectric point (pI), amino acid composition, charge (positive or negative), atomic composition, extinction coefficient (EC), estimated half-life, instability index (II), aliphatic index (AI) and grand average of hydropathicity (GRAVY). The amino acids and atomic compositions are self-explanatory [11].

Prediction of functional sites:
Screening for domain sites of the target (hypothetical) plus compared proteins sequences were predicted using the Pfam (protein families) database [12]. Thereafter, transmembrane domains were predicted by using the SOSUI server [13], which distinguishes between the membrane and soluble proteins from amino acid sequences, and predicts the transmembrane helices for the former.

Secondary structure prediction:
The secondary structures of proteins were estimated by SOPMA

Homology Modelling and model validation:
The 3D models of the proteins were constructed using Swiss-Model server [15]. Molecular graphics and analyses were performed with the UCSF Chimera package [16].

Results and discussion:
The protein-BLAST search revealed that the HP CA803_03125 was similar to ADP-ribose hydrolase proteins belong to the other S. aureus spp (Table 1). Furthermore, a small number of variations was detected within an MSA result (Figure 1). The phylogenetic tree showed that protein sequences with accession numbers: Q8NYB7.1 and Q5HIW9.1 were the closest strains (NCBI Taxonomy IDs: 196620 and 93062 respectively) to HP CA803_03125 (Figure 2). According to physicochemical parameters result, the HP CA803_03125 was found to share same physicochemical properties with the ADP-ribose hydrolases. All the proteins seem to be mildly acidic. The values of instability index for all strains were lower than ©Biomedical Informatics (2019) 40 indicating that all the proteins are stable [17]. All strains showed higher aliphatic indices, which suggested that proteins are stable over a wide temperature range. The GRAVY value is negative, which indicates the hydrophilic and the soluble nature of the proteins [18]. Various parameters were arranged in (Tables 3 and  4). Pfam search resulted in identifying a MACRO domain between the HP CA803_03125 and ADP-ribose hydrolase proteins from other strains (Table 5). Macro domains are ancient, highly evolutionarily conserved domains that are widely distributed throughout all kingdoms of life. The 'macro fold' is roughly 25kDa in size and is composed of a mixed α-β fold with similarity to the P loop-containing nucleotide triphosphate hydrolases. They function as binding modules for metabolites of NAD + , including poly (ADPribose) (PAR), which is synthesized by PAR polymerases (PARPs) (Figure 3) [19]. By the same token, all proteins were classified as soluble proteins and the hydrophobicity range was found from -0.254753 to -0.292857. The secondary structure information may give insights into the higher order structure and functional annotation of the protein [20]. The secondary structure of HP CA803_03125 and ADP-ribose hydrolase proteins was found to be the same. The α-helices were found dominant, followed by random coils, extended strands (β-sheets) and β-turns in all proteins ( Table  6). Furthermore, homology modeling for all proteins was built based on a single model template (PDB ID 5kiv) which was the most similar to all of them with varying similarity ranging from 97.74 -100%. In addition, the GMQE (Global Model Quality Estimation) scores were found to be 0.97 in all proteins, which are very close to 1, as higher number indicates higher reliability ( Table  7).    Figure 4: 3D structure of the template protein (PDB ID: 5kiv). Golden color shows alpha helices, red color shows extended strand, black color shows Random coil and Beta turns, green color represents the N terminal and pink color represents the C terminal.

Conclusion:
In this study, the hypothetical protein (OXL90457) from SO-1977 was predicted and identified to be ADP-ribose hydrolase protein.
The bacterial MACRO domains are known to influence processes that are crucial for the survival and virulence of bacteria in the host environment. Therefore, MACRO domain would be a subject for further investigations in order to understand the host-pathogen interactions and to explore novel therapeutic routes. The hypothetical protein (OXL90457) should be updated in the NCBI database to be included under the name of ADP-ribose hydrolase.