RNAclust is a perl script summarizing all the single steps required for clustering of structured RNA motifs, i.e. identifying groups of RNA sequences sharing a secondary structure motif. It requires as input a multiple FASTA file. In the first step for each input sequence the base pair probability matrix of its secondary structure distribution is calculated (using RNAfold from the Vienna RNA package). Secondly, for each pair of base pair probability matrices a sequence-structure alignment is calculated using LocARNA. Lastly, a hierarchical cluster-tree (in NEWICK format) is derived by WPGMA clustering of the pairwise alignment distances and the optimal number of clusters is calculated from the tree.
For more information please read the full documentation [PDF].
The following JAVA viewer can be used for browsing the hierarchical cluster tree: Viewer
Updated: 30 Jul 2010