A method for clustering of miRNA sequences using fragmented programming

Clustering of miRNA sequences is an important problem in molecular genetics associated cellular biology. Thousands of such sequences are known today through advancement in sophisticated molecular tools, sequencing techniques, computational resources and rule based mathematical models. Analysis of such large-scale miRNA sequences for inferring patterns towards deducing cellular function is a great challenge in modern molecular biology. Therefore, it is of interest to develop mathematical models specific for miRNA sequences. The process is to group (cluster) such miRNA sequences using well-defined known features. We describe a method for clustering of miRNA sequences using fragmented programming. Subsequently, we illustrated the utility of the model using a dendrogram (a tree diagram) for publically known A.thaliana miRNA nucleotide sequences towards the inference of observed conserved patterns


Background:
The human genome is known to contain thousands of miRNAs. More than 3000 new miRNAs with sequences have been recently identified [1][2][3]. Increasing numbers of such new miRNAs will be identified leading to a problem for affiliating these with known families for finding new families. The division of miRNAs into families does not adequately reflect the degree of nucleotide sequence similarity, and the categorization of miRNAs into families requires quantitative criteria defining the differences between families. The genomes of different organisms have orthologous miRNAs that should be distributed into families. Hence, it is necessary to establish the degree of similarity for orthologous miRNAs and their belonging to different families. Several authors propose different functional clustering methods for this purpose [4][5][6]. Therefore, it is of interest to describe a method for clustering miRNAs sequences using fragmented programming.

Model for clustering miRNA nucleotide sequences
Clustering nucleotide sequences is a process of sequence comparison with the definition of maximum number of nucleotide coincidences. This is useful for constructing a graphical structure like a tree defining relationship between sequences. The formulated model for a sequence based clustering problem is illustrated in Figure 1.

Algorithm: Clustering of miRNA sequences for phylogenetic tree
The main issue with large range nucleotide sequences is lack of sufficient computing power. We describe a fragmented algorithm ( Figure 2) for clustering miRNA nucleotide sequences in 5 steps using a flowchart (Figure 3).

Dataset for model testing
A dataset of known miRNA nucleotide sequences from A. thaliana ( Table 1) was tested using this method.

Model algorithm Splitting a set of nucleotide sequences
{u l }, l = 1, N This needs to be compared into M uniform groups.

Processing of sequences
Each of the M groups of sequences is processed by a corresponding procedure. This leads to total sequences for each group.
Step 2 , k = 1, M } M is the number of fragments.

Iteration to first step
Step 3

Dendrogram (Tree diagram)
Step 5 A dendrogram was drawn with Neighbourhood Joining and UPGMA algorithms for the resultant clustered sequences.

Results and Discussion:
A fragmented algorithm for miRNA nucleotide sequence clustering and a program application were developed to define the degree of relationship between sequences according to their clustering. This helps to create phylogenetic trees based on Neighbourhood-Joining (NJ) and UPGMA algorithms in this approach after clustering known miRNA sequences (Figure 4).
Many programs are available for searching related sequences in databases. This is useful for creating multiple alignments for generating phylogenetic trees. Tools used in such analysis include BLAST, ClustalW, ClustalX, UGENE and many others. The main issue here is lack of sufficient computing resources for large-scale analysis.
The method described here using fragmented programming optimised the time required for data processing during clustering. This achieved better clustering results by dividing the set of sequences into M independent groups (fragments) processed by M blocks, each of which will undergo fragment clustering irrespective of other fragments. The overall clustering is performed for all sequences in each group.
A clustering process occurs simultaneously in all groups where independent processing is possible for all processed data in fragmented programming. Merging all related sequences in a fragment forms a cluster as clustering is completed in each block for every group.   The advantages of fragmented programming are the feasibility for automatic (1) parallel computing, (2) dynamic properties, (3) calculation of multiple architectures, and (4) subsequent analysis of parallel computing. The fragmented algorithm requires a minimum management determined by data dependency and is not dependent on the distribution of resources. Thus, it assumes a set of ways for process execution that provides portability. A problem of executive system is to execute display of objects in an algorithm (variables, operations) on resources to a concrete computing system. This automatically provides all necessary dynamic properties for parallel computing. Fragmentation is a processing method for reducing the number of objects in the algorithm. This simplifies a problem of creation for effective distribution of resources and management.
The resultant sub trees were united into one phylogenetic tree (Figure 4) by the main block in the algorithm. Its computing complexity makes O(nlk) operations where k is the number of clusters, n is the size of a dataset and l is the quantity of cycles in the algorithm (Figure 5).

Conclusion:
We describe a method for clustering of miRNA sequences using fragmented programming. The method creates sequence clusters as input to NJ and UPGMA for generating phylogeny related tree diagrams. We used known A. thaliana miRNA nucleotide sequences and developed clusters using this method for generating a sample dendrogram to illustrate the utility of the model.