Svhip is a specialized SVM training software designed for use with the classifier module of RNAz. Shvip handles the generation of training data from raw alignment files and writes a .model file for use with RNAz. Alternate options allow for the creation of independant test and training sets or simply the accuracy optimization of pre-calculated training instances for later classifier training.
Please refer to the Readme for further instructions.
The Svhip software is packaged as a python wheel archive, to be found under http://www.bioinf.uni-leipzig.de/Software/Svhip/MasterThesis/Svhip/Install/. To install it on your machine, please download the .whl file and execute the following command in the directory containing it:
---pip install svhip_dev-0.0.29.whl
The rest should happen automatically. Please note that you will need to have the following external tools and pieces of softwareinstalled for a successful installation:
-python 3.7.2 (or higher)
-SISSIz 0.1 (or higher)
Furthermore, you will need the following python3 modules (will perhaps be included in future releases for easier use):
Also included is a modified version of the software RNAz 2.0 for the de novo detection of non-coding RNA (both source and pre-compiled). The package can be downloaded under http://www.bioinf.uni-leipzig.de/Software/Svhip/MasterThesis/RNAz/. The modification allows to load external SVM classifier modules using the following command line arguments:
---RNAz -M [.model file] [TARGET]
Using this version is necessary to load the dinucleotide decision model files that form the output of Svhip.
Other files included under http://www.bioinf.uni-leipzig.de/Software/Svhip/MasterThesis/RNAz/ are listed here
-Trained Classifiers: All experimental decision model files generated using the Svhip software. Subdivided into Classifiers trained on pre-screened Rfam data and those trained on data sets assembled from Rfam Clans.
-Test sets: Calculated feature vectors of all test sets used in the evaluation of the trained classifiers.
-Rfam screening: Text file containing screening results of the Rfam data base. Included are families considered for training set assembly after structural conservation estimation. The file is organized by columns as follows:
[id] [number sequences] [generated alignment windows] [mean tree edit distance of windows] [mean tree edit distance control set] [filtered alignment windows]
-Accuracy results: ROC curves and absolute accuracies on test sets used in performance evaluation