Structure based functional annotation of a MYND-less lysine methyl transferase inCandida albicans

Candida albicans is opportunistic pathogenic yeast that is widely distributed throughout the world and is classified as the most critical fungal pathogen group. Candida albicans is a common microbiota of healthy individuals but can cause superficial and invasive infections in immune compromised individuals. Protein Post-translational modifications involving methylation of lysine amino acids stand for a major regulator of eukaryotic transcription, and pathways controlling several cellular processes. SMYD makes up a SET (Su (Var) 3-9, Enhancer-of-zeste and Trithorax) and MYND (Myeloid, Nervy, and DEAF-1) domain containing lysine methyl transferase subfamily that transfers methyl groups from methyl donors onto lysine residues in histones (H3 and H4) and non-histone proteins. The SET domain is the methyltransferase catalytic domain, while MYND participates in both protein and DNA interactions. Well-studied examples of SMYD proteins are five human and two Saccharomyces cerevisiae, constituting examples of histone and non-histone protein lysine methyl transferase members. However, there is limited understanding of SET lysine methyltransferases, including the SMYD subfamily, in the pathogenic fungi Candida albicans. Using bioinformatics tools, we characterized the SMYD domain containing proteins in the important pathogen. We report the presence of an atypical SMYD member (CaO19.3863) as a new lysine methyltransferase that can be a target for antifungal therapy.


Background:
SMYD proteins are a family of lysine methyltransferases that are characterized by a SET and MYND domain.S-Adenosylmethionine (SAM) is a methyl donor that is bound to the SET domain by many lysine N-methyltransferases [1].The MYND domain is a unique domain that is specific to SMYD proteins, and it is responsible for the recognition of the target lysine residue.The structure of the SET and MYND domains of SMYD proteins is highly conserved among different isoforms, and they are arranged in a characteristic fold that is essential for their enzymatic activity.In particular, the SET domain is composed of an alpha/beta barrel that is stabilized by a conserved zinc ion, while the MYND domain is composed of a helical bundle that is stabilized by two conserved cysteine residues [1,2].The enzymatic activity of SMYD proteins is mediated by the formation of a catalytic triad comprising the SET domain, the MYND domain, and a lysine residue on the target protein.The SET domain binds to SAM and transfers the methyl group to the lysine residue, while the MYND domain facilitates the recognition and positioning of the target lysine residue [1].In addition, the MYND domain contains a conserved hydrophobic patch that is thought to interact with the methylated lysine residue.Moreover, the basic residues in MYND domain contribute to DNA binding and the MYND domain of yeast Set5 (Fungal SMYD family) is its mediator in chromatin association and gene repression near telomere [3].The C-terminal domain of different SMYD members is also involved in protein-protein interactions, localization and control methyltransferase activities in some SMYD members [3].The fivemembered human SMYD family of protein lysine methyltransferases (SMYD1-5) have established roles in muscle immune, blood, heart, vascular endothelial physiology, hostpathogen interaction, and in multiple cancer patho-physiology [4,5].Their ability to methylate specific lysine residues on histone proteins (K4,36,37 Histones 3 and K5, Histones 4) and hon-histone proteins (HSP90, Rb, P53, MAP3K2) is crucial for regulating gene expression and protein function in cells [4-6].Overexpression or deregulation of the human SMYD enzymes has been linked to the emergence and progression of cancer, making them promising targets in cancer therapy [5-7].SMYD members with structural similarity in Saccharomyces cerevisiae (Set5 and Set6) have similar domain arrangement like their human counterparts.The Set5 enzyme is involved in genome stability and gene expression regulation near telomere mediated by its histone H4 are K5, K8 and K12 methylation activities [3].In the related opportunistic yeast Candia albicans, the SMYD family of KMT's are relatively uncharacterized, with Candia albicans SET6 expression being regulated by the transcription factor Hap43 and conditions of biofilm and catheters [8].In this paper, the uncharacterized SMYD family proteins of this yeast was considered for in silico functional and structural characterizations.Using sequence and structure analysis tools, the 3D models were used for identification of residues involved in ligand binding.In this work, we uncover and characterize a novel SMYD member, conserved in the Candida clade organism, which has structural conservation with known SMYD members but completely lacks the MYND zinc finger spanning the SET domain.This protein can be a target for biochemical, genetic, experimental analysis or any future antifungal therapy experiments to ascertain the function.

Functional analysis of SMYD Proteins
UniprotKB was used to download well-studied SMYD domain proteins from Human, Mouse, Arabidopsis, toxoplasma, and yeast.As part of InterProscan [11] analysis, these SMYD domain proteins were further functionally annotated by their superfamilies, families, domains, folds, and motifs to identify conserved domains.We also used the HHpred tool for structural annotations using HMM-HMM searches [12]

Secondary structure and other protein features prediction
The PredictProtein webserver was used for features such as protein-protein and protein-DNA binding sites, disorder, and metal binding sites [26].PSIPRED 3.2 was employed for secondary structure prediction using neural networks, and CysPRED [27] was employed to detect disulfide bonds.[27].Protein interactions were discovered through physical and functional correlations using STRING 11.5 [28].The LambdaPP pipeline [29] was used to annotate gene ontology (GO), binding residues, secondary structure, and variant effect scores.

FoldSeek was used for comparative structural analysis [33]. Molviewer was used for visualizations [34]. GalaxyWEB [35]
docking tool GalaxySite was used for ligand prediction.The Computed Atlas of Surface Topography of Proteins was used for active site determinations [36].LambdaPP and Predictprotein were used to predict catalytic and SAM, polypeptide, and metal binding sites.

Results and discussion:
The SET domain [Su (var) 3-9, zeste enhancer, Trithorax] containing methyltransferases (KMT) of the Candida albicans proteome was identified by sensitive hmmsearch using the hmm profile for SET domain.This search strategy gave eight distinct SET domains containing KMTs (Table 1).We extended sequence conservation studies with MEME analysis to indicate highly divergent SMYD domains (Table 1).SMYD domain architecture was found in several proteins, including CaSMYD, but with the lowest p-value (Figure 1  The molecular weight was predicted to be approximately 73330.87 and reported to be a stable protein with an instability index of 39.90, like CaSet6, while CaSet5 is predicted to be unstable (instability index 50.67)(Table 3).A negative gravy value (-0.053) shows that the protein is more water soluble than CaSet5 and Caset6.The globular protein CaSMYD has a higher aliphatic index of 101.94, indicating thermostability over Caset5 and CaSet6.Interpro, SMART, CATH, and CDD databases identified SET domain profiles, but no other domain profiles matched in Candida proteins.Also, the HHpred hmm-hmm profile search yielded the highest CaSMYD hit with human HMT (Smyd1), whereas CaSet5 and CaSet6 profile-profile searches yielded the highest hits with human Smyd2 and Smyd3 (Table 3).The Candida albicans SMYD KMT family comprises canonical CaSet5 & 6 and non-canonical CaSMYD proteins.CaSMYD proteins lack transmembrane domains, coils, or secretory signals, and are predicted to be found both in the nucleus and cytoplasm with a Nuclear Export Signal (NES).CaSMYD amino acids were predicted to bound macromolecules such as DNA, proteins, and small molecules, including metals (Table 3).Based on these findings, it is likely that the reported protein will have similar requirements and function as the other SMYD KMTs, which bind to SAM, zinc, and protein lysine as substrates.It is believed that protein intrinsically disordered regions are crucial structural and functional regulatory regions.CaSMYD was predicted to contain an N-terminal disordered region involved in protein binding.Among the three Candida SMYD methyltransferases, Caset5 was predicted to have a high confidence C-terminal large disorder region (400-473 aa).To increase the confidence of correct functional assignment to the uncharacterized proteins, the protein-protein interaction (PPI) network was evaluated for the CaSMYD protein annotation (Figure 2 (B)).Interacting partners of CaSMYD were SET2, SET6, and CTM1 in cluster 1 (PPI enrichment p-value: 6.03e-10).The predicted biological process was methylation (GO:0032259, FDR 0.00045); while the Molecular functions associated was histone-lysine nmethyltransferase (GO:0018024, FDR), Protein-lysine nmethyltransferase activity (GO:0016279, FDR 9.17e-05) and the methyltransferase activity (GO:0008168, FDR 9.17e-05).The predicted KEGG pathway was of lysine degradation (cal00310, FDR 6.98e-08) (Figure 2(B)).The secondary structures for amino acid residues were predicted in eight states.Of the CaSMYD protein's secondary structure, 43.02% are helices, 17.94% are extended helices, 33.97% are of random coils, and 5.08% are beta turns (Figure 2 (C)).For full-length proteins and domain sequences, AlphafFold machine learning model was used to generate the tertiary structure of SET and MYND proteins.Through CLUSTp 3.0, 108 amino acids were identified as being involved in the formation of the CaSMYD protein active site, based on the best ranked full-length models.The best active site had the Richard's solvent accessible surface area of 2121.012Å 2 and solvent accessible volume of 1919.608Å 3 (Figure .2(C)).We used the AlphFold models to identify homologs using reciprocal best structural hits and sequence similarities.To get the best structural alignment from PDB100 database, we used AlphFold models for three SMYD proteins in PDB files in FoldSeek server search.CaSMYD structure matched the SMYD3 protein with a low Tm score (Table 4), suggesting considerable divergence.We used the generated 3D models to determine the small molecule binding propensity of Candida SMYD KMTs.GalaxySite and AlphaFill webservers predicted the same ligand binding spectrum as the CaSMYD proteins for two canonical Candida SMYD proteins.Among different predicted CaSMYD ligands were, a cofactor Sadenosylmethionine (SAM), substrate lysine, and metal ions like zinc and nickel (Figure 3

(B) & 3 (C))
. Furthermore, we also identified binding of multiple inhibitors of human SMYD2 and SMYD3 proteins bound to the CaSMYD active sites and structural elements.Molecules like sinefungin, 62X, NH5, and LLY-507, are predicted to bind CaSMYD (Table 4).Our manuscript describes the SET and MYND zinc finger KMTs of Candida albicans and proposes a new member of the Set5 and Set6 SMYD subfamily of methyltransferases.This SMYD member lacks a MYND zinc finger.Thus, we have identified a new lysine methyltransferase within the pathogen's genome utilizing well-established and recent tools for annotation of uncharacterized enzymes.We previously reported a Komagataella phaffii ortholog with the SMYD family of yeast [37].
The MYND-less subfamily of SMYD KMTs has been proposed based on analysis of phylogeny, conservation of the SET motif and MYND motif, binding residue prediction, and similarities between 3D structures and ligand binding data.As a result of this study, a solid foundation has been laid for future research into the biochemistry and molecular function of CaSMYD protein in Candida albicans.

Conclusions:
This study provides an understanding of Candida albicans SMYD methyltransferases, including CaSMYD (orf19.3863), a new member of the family.Protein sequence analysis, structural modelling, and structure-based docking studies were used to determine the potential role of this uncharacterized protein as a member of the protein lysine methyltransferase family.

Figure 1 :
Figure 1: Evolutionary analysis and motif conservation in SMYD proteins (A).The evolutionary history was inferred by using the Maximum Likelihood method and JTT matrix-based model.The accession numbers of the protein sequences are mentioned in Table 2.The percentage of trees in which the associated taxa cluster together is shown next to the branches.(B) MEME motif conservation across SMYND family members suggest the N terminal SET-MYND motif divergence.(C) Sequence logos for the motifs identified by MEME in panel B. (D) The C-terminal CX2CX24CX2C finger and other conserved motifs in Candida clade CaSMYD like proteins.Various Arabidopsis thaliana and Schizosaccharomyces pombe SMYD KMTs were found in Uniprot blast (Table2), however, no orthologs were detected in Saccharomyces cerevisiae or related yeasts in the Saccharomycetaceae family.High-identity sequences were identified among Saccharomycetales through blastp searches.The hits were identified in the genus Candida, Lodderomyces, Spathaspora, Debaryomyces, Scheffersomyces, and Meyerozyma belonging to the CUG-Ser1 clade.Komagataella, Pichiaceae and Saccharomycetales incertae sedis are the other clades having CaSMYD like proteins.Evolutionary analysis using Human, Mouse, yeast, and Toxoplasma gondii proteins identified CaSMYD to be an out group to the Set6 proteins of Saccharomyces cerevisiae and Candia albicans.CaSet5 & ScSet5 clustered with the human SMYD5 protein (Figure1(A)) while CaSMYD, ScSet6, and CaSet6 were out grouped into the human SYMD1-4 cluster (Figure1(A)).
(B)), suggesting a divergent structure.CaSMYD domains included pre-SET (low identity), SET, and post-SET, domains, but CaSMYD lacked the zinc finger MYND domain C1X2C2X9C3X2C4X5C5X3C6X6H8X3C8 or its variant.The N-terminal ZNF-MYND motif (CX2C, motif of CaSMYD was after the post-SET (FXCXCX2C) (motif#2 Figure.1(B)).This was like the ASHR1 (Arabidopsis thaliana) and AKMT (Toxoplasma gondii) (Figure.1 (C)) proteins.Indeed, the MEME motif analysis with yeast CaSMYD proteins identified the motif#6 to be C terminal part of a new C2C2 ZNF (CX2CX24CX2C) motif present in closely related Candida species (Figure.1(D).The position of this C2C2 ZNF in CaSMYD spanned amino acid 323 to 372.Also, two unique Cterminal domains (CTD motifs, (Figure.1 (D)) were enriched in all non-saccharomyces yeast proteins.According to these results, CaSMYD protein contains SET-MYND domain, but its structure differs from Set5 and Set6 or other SMYD proteins.Therefore, Candia albicans CaSMYD protein contain uncharacterized SET-MYND domain, making it a non-canonical member of the SMYD family of SET MYND sub family of SET lysine methyltransferases.This report presents structural and functional annotations of diverged CaSMYD protein in comparison to other members of this family (Table 3, and 4).This sub-family was present in several yeast clades but not in model organisms S. cerevisiae and Schizosaccharomyces pombe (Figure.2(A)).

Figure 2 :
Figure 2: Evolutionary analysis of CaSMYD protein, PPI prediction and pre residue secondary structure.(A) The evolutionary history was inferred by using the Maximum Likelihood method JTT model on the MSA of yeast CaSMYD orthologs.The Candida clade is shown in red branches.The accession number and names are mentioned in the tree.(B) Prediction of the clustered PPI network of CaSMYD protein with nodes representing several methyltransferase and histone proteins.(C) Amino acids are colored based on their R group with Cysteine residues highlighted.The Candida clade specific C-terminal C2C2 finger sequence is underlined (Upper panel).The active site in the CaSMYD using the AlphaFold3D model is depicted with binding pocket residues shaded in gray and per residue secondary structure (Lower panel).

Figure 3 :
Figure 3: Tertiary Structure model with Co-factors, Ligands, and Ions.(A) The AlphaFold 3D structure of CaSYMD (UniProt ID: A0A1D8PT54) transplanted with different co-factors, ligands and ions based on AlphaFill algorithm.The compound names are mentioned in Table 4.The tertiary model is colored according to the pLDDT score, and the unfavorable transplant score in the table is highlighted in yellow.(B) Zinc ion coordination with Cysteine residues (C) S-adenosylmethionine (SAM) binding poses depicting amino acids and noncovalent interactions.H-bonds (blue dashed lines), Cation-Pi interaction (orange lines), and Pi stacking (Green lines).

Table 1 :
Candida albicans SET domain methyltransferase from HMMSEARCH and their annotation.The Orthologs were detected using BLASTP searches, and domain positions assigned using PROSITE profile matches.
The hits were exhaustively searched by blastp based homology against Uniprot, PDB and nonredundant databases.There are five SETs and three SMYD domain proteins in the Candida albicans genome that constitutes KMT proteins (Table1).Comparatively, Saccharomyces cerevisiae has only two SMYD domain proteins and five SET proteins (Table1).In Saccharomyces cerevisiae,

Table 2 :
List of Protein Data Bank (PDB) and Uniport hits with CaSMYD blastp

Table 3 :
The physiochemical, and structural features of Candia albicans SMYND proteins

Table 4 :
Predicted Ligand Binding to CaSMYD