Prediction driven functional annotation of hypothetical proteins in the major facilitator superfamily of S. aureus NCTC 8325

Antibiotic resistance Staphylococcus aureus strains cause several life threatening infections. New drug treatment options are needed, but are slow to develop because 50% of the S. aureus genome is hypothetical. The goal of this is to aid in the annotation of the S. aureus NCTC 8325 genome by identifying hypothetical proteins related to the Major Facilitator Superfamily (MFS). The MFS is a broad protein group with members involved in drug efflux mechanisms causing resistance. To do this, sequences for three MFS proteins with x-ray crystal structures in E. coli were PSI-BLASTed against the S. aureus NCTC 8325 genome to identify homologs. Eleven identified hypothetical protein homologs underwent BLASTP against the non-redundant NCBI database to fit homologs specific to each hypothetical protein. ExPASy characterized the physiochemical features, CDD-BLAST and Pfam identified domains, and the SOSUI server defined transmembrane helices of each hypothetical protein. Based on size (300 – 700 amino acids), number of transmembrane helices (>7), CD06174 and MFS domains in CDD-BLAST and Pfam, respectively, and close relation to well-defined homologs, SAOUHSC_00058, SAOUHSC_00078, SAOUHSC_00952, SAOUHSC_02435, SAOUHSC_02752, and ABD31642.1 are members of the MFS. Further multiple-alignment and phylogeny analyses show SAOUHSC_00058 to be a quinolone resistance protein (NorB), SAOUHSC_00058 a siderophore biosynthesis protein (SbnD), SAOUHSC_00952 a glycolipid permease (LtaA), SAOUHSC_02435 a macrolide MFS transporter, SAOUHSC_02752 a chloramphenicol resistance (DHA1), and ABD31642.1 is a Bcr/CflA family drug resistance efflux transporter. These findings provide better annotation for the existing genome, and identify proteins related to antibiotic resistance in S. aureus NCTC 8325.


Background:
Staphylococcus aureus is an opportunistic pathogen responsible for a wide variety of infections including superficial skin and surgical wound infections, toxic shock syndrome, and bacteremia [1]. Most are nosocomial infections, though there are increases in community acquired (CA) Methicillin-resistant Staphylococcus aureus (MRSA) infections, particularly among immunocompromised patients. Other health issues related to internalized infections are heart and lung diseases such as endocarditis and necrotizing pneumonia found in younger community populations rather than remaining solely a hospital acquired infection. Deaths from S. aureus caused heart and lung infections are reported [2]. In 2011, the Center for Disease Control estimate 80,000 invasive MRSA infections and 11,285 related deaths in the United States [3]. These deaths are primarily due to MRSA strains that are resistant to macrolides, monovalent cationic antimicrobials, quinolones, bivalent quaternary ammonium compounds, tetracycline, and all betalactam antibiotics including penicillin, amoxicillin, methicillin, and oxacillin [4]. Inactivation of antibiotics, reduction in cellular permeability, alteration of antibiotic target sites, and bacterial efflux pumps convey drug resistance [5]. Several multidrug efflux genes, such as the NorA, NorB, and NorC from the S. aureus chromosome, confer resistance to quinolones and other antibiotics [6,7]. Disturbingly, an increase in the variety of drug-resistant strains of S. aureus has been noted in the past years, with the most prevalent being vancomycin-resistant S. aureus (VRSA). Usually VRSA develops in MRSA patients treated with vancomycin, the frontline treatment to MRSA. While VRSA is rare with most S. aureus being vancomycin-intermediate meaning that large amounts of vancomycin still kill the organisms, this presents a new challenge to combat S. aureus infections.
These superbugs are generally sensitive to intravenous medication, such as quinupristin-dalfopristin, that require slow infusion in a large fluid volume, making it unrealistic for administration to CA-MRSA patients in an outpatient setting.
Quinupristin-dalfopristin can also cause disabling myopathy as a side effect. Due to documented increases of a global spread of CA-MRSA in just the past 20 years and the increases in antibiotic resistances, there is a need for new treatment options [2].  [13]. This is likely the case with S. aureus NCTC 8325, whose genome was published in 2006 [14].
With approximately half of all S. aureus NCTC 8325 genomic protein sequences currently annotated as hypothetical proteins and 25% of all membrane transport proteins belonging to the MFS, ergo likely related to antibiotic resistance, there is great potential for the discovery of new drug targets here [15]. Since proper annotation of hypothetical proteins can lead to new therapeutic targets, a high demand to characterize hypothetical proteins is present. Ergo, this study uses in silico techniques to identify and characterize hypothetical proteins in S. aureus NCTC 8325 that are related to the protein MFS. Figure 1 illustrates the overall experimental design.

Physicochemical Characterization
To characterize the proteins, the Expasy Protparam server computed several physicochemical characterizations. The number of amino acids, molecular weight, total number of charged residues (the addition of arginine and lysine for negatively charged and aspartic acid added to glutamic acid for positively charged) [16]. The algorithm determines the theoretical isoelectric point, the pH where a molecule carries no net electrical charge, from the number of charged residues. Further, the program calculates the extinction coefficient, the amount of light absorbed by the protein at a 280nm wavelength, which is helpful for protein purification procedures [17]. A protein's stability in a test tube under physiological conditions is measured by its instability index [18]. The relative volume occupied by open side chain amino acids in a protein is the aliphatic index [19]. The grand average hydropathy (GRAVY) is the total of the hydropathy values of all amino acids in the protein divided by the number of resides a measure of hydrophobicity for a given molecule [20]. The SOSUI server also determines a protein's hydrophobicity, though via solubility computations, and it further characterizes potential transmembrane regions [21].  Magenta names are representative sequences colored red to identify predicted alpha-helix secondary structures. The black names belonging to the same alignment group as the magenta name above it, indicating a strong relationship between the two. Consensus_aa, consensus amino acid sequence; Consensus_ss, consensus predicted secondary structures; h, consensus predicted secondary structure alpha-helix. To distinguish that, homolog identification, physiochemical characterization, transmembrane enumeration, and domain identification compared hypothetical proteins to established MFS proteins. Table 4 lists the top BLASTP homolog for each hypothetical protein regardless of origin. A hypothetical protein that has a homolog with a well-defined function is more likely related. SAOUHSC_02307, SAOUHSC_02309, and ABD31816.1 hit several general membrane proteins. SAOUHSC_02620 and SAOUHSC_02866 matched several hypothetical proteins as well as general membrane proteins. This indicates these hypothetical proteins may not be in the MFS. Figure 3: Alignment of EmrD homologs aligned by PROMALS3D. Magenta names are representative sequences colored red to identify predicted alpha-helix secondary structures. The black names belonging to the same alignment group as the magenta name above it, indicating a strong relationship between the two. Consensus_aa, consensus amino acid sequence; Consensus_ss, consensus predicted secondary structures; h, consensus predicted secondary structure alpha-helix.  # AA, number of amino acids; MW, molecular weight; pI, theoretical isoelectric point; # neg, total number of negatively charged residues (Asp + Glu); # pos, total number of positively charged residues (Arg + Lys); EC, extinction coefficient assuming all pairs of Cys residues form cystines; II, instability index; AI, aliphatic index; GRAVY, grand average hydropathy.  Figure 4: Representative phylogenetic tree of proteins produced via PHYLIP package programs showing six hypothetical proteins belong evolutionarily to the major facilitator superfamily (MFS). SAOUHSC_00058, SAOUHSC_02435, SAOUHSC_02752 and ABD31642.1 are related to drug efflux proteins. SAOUHSC_00078 is closely related to a siderophore biosynthesis protein as SAOUHSC_00952 is confirmed to be a glycolipid permease.  Finally, potential MFS hypothetical proteins should have similar domains to those found in GlpT, LacY, and EmrD. Table 7 lists the CDD-BLAST results and Table 8  Based on these data collectively, the following hypothetical proteins are likely MFS proteins due to their size (300 -700 amino acids), well-defined BLAST homologs (no generalized membrane or hypothetical proteins), more than seven transmembrane regions, and CD06174 and MFS domains from CDD-BLAST and Pfam, respectively, underwent evolutionary analyses: SAOUHSC_00058, SAOUHSC_00078, SAOUHSC_00952, SAOUHSC_02435, SAOUHSC_02752, and ABD31642.1. Since either the GlpT or EmrD proteins identified the six hypothetical proteins most likely to belong to the MFS, the study removed LacY and its homologs in Table 2 from further study. To evaluate how related these hypothetical proteins were, proteins of interest underwent multiple sequence alignment and phylogenetic tree construction. For these analyses, all defined homologs from Tables 1 and 3 combined with hypothetical proteins fitting the study's criteria completed multiple sequence alignment as displayed in Figures 2 and 3, respectively. These analyses included top BLASTP hits from Table 4 not already included in Tables 1 and 3. Both alignments found over 10 alpha helices in the consensus sequences, as expected from MFS members. Hypothetical proteins aligned with their top BLASTP hits from Table 4 best, with SAOUHSC_00952 also closely aligning with the MFS transporter. To visualize how closely related these proteins are phylogenetic trees were constructed. Though all similar, a representative tree of all the proteins and their homologs is shown in Figure 4. The same NorA multidrug efflux MFS transporter came up in all three E. coli PSI-BLASTs while GlpT (1PW4) and EmrD (3GFP) identified two separate NorB quinolone resistance proteins. SAOUHSC_00058 nuzzled between the two NorB proteins. As expected, phylogeny confirmed the multiple sequence alignment. Hypothetical proteins related closely with their top PSI-BLAST hits from Table 4, with SAOUHSC_00952 being closely related to the MFS transporter also. Floyd's illustration of the established proteins shown in the phylogenetic tree presented here is similarly arranged [4].