SnoReport 1.0 ============= by Jana Hertel: jana@tbi.univie.ac.at Outline: -------- 1) Introduction 2) Installation 3) Howto 4) Summarizing tool 1) Introduction --------------- SnoReport is a new tool to detect snoRNA genes without using information on their targets. Based on a single sequence - given by the user - the program checks whether the sequence might represent an H/ACA snoRNA or a C/D snoRNA or neither of both. In general, the program first searches for the class-specific boxes C and D or H and ACA, respectively. The sequence is then folded, preventing the boxes from building base pairs. The resulting structure is analysed and characteristic features are extracted and summarized in a numerical vector. The vector is analysed by an included Support Vector Machine and specifically trained models. The SVM then returns a probability for the input to belong to one of the classes. Probabilities higher than 0.5 are assigned as positive candidates. If it happens that the input is positiv for C/D and H/ACA snoRNA, one has to estimate the difference of both probabilities and make an own decision. 2) Installation --------------- SnoReport needs the Vienna RNA Package installed on your computer. So make sure the package is correctly installed and if necessary adjust the paths in the makefile, such that the Vienna RNA library and the include files can be found. Download SnoReport from: http://www.tbi.univie.ac.at/~jana/software/SnoReport1.0.tgz to a directory on your computer. Extract the files from the archive: $ tar xzvf SnoReport_1.0.tgz To install the program type: $ ./install.sh Before you can use the program you need to set the environment variable SNOREPORT, pointing to the 'models/' directory of your installation directory. bash: $ export SNOREPORT=/your/installation/directory/models tcsh: $ setenv SNOREPORT "/your/installation/directory/models" now. have fun! 3) Howto -------- SnoReport reads your sequences from stdin. So you can either direct your file to the program: ./snoReport < yourSequences.fa or you can start the program with your chosen options: ./snoReport -r then you will be asked to paste your sequences to the prompt. INPUT: The sequences need to be in FASTA format. IMPORTANT: Please substitute space characters in the headers of you sequences by other non-space characters to avoid loosing important information! Each sequence is analysed for being a C/D or an H/ACA snoRNA Usage: snoReport [-m PATH] [-r|-h|-v] < FILE FILE file in FASTA format >= 1sequence PATH points to location of the 'models' directory of your SnoReport installation -r Classify reverse complement of input data -h shows this help message -v shows version information 4) Summarizing tool ------------------- To summarize your output of SnoReport you can use the perl program: 'summarizeOutput.pl'. It simply takes an output file of SnoReport and returns a list of the best non-overlapping candidates that were predicted by SnoReport of your sequence(s). Calling the script without options gives you this usage instructions: usage: perl summarizeOutput.pl -in FILE [-out FILE | -posOnly BOOL] version: June 2007 options: -in Path to output file of your SnoReport run -out Path to output file of ./summarizeOutput.pl. Default: STDOUT -posOnly Set to 1 if only positive candidates should be analysed, 0 for both. Default: 0 purpose: Extracts the best non-overlapping candidates predicted by the SnoReport program (version 1.0). results: List of the best (prediction probability) candidates that do not overlap. Output contains 6 columns: 1 => snoRNA class 2 => 1 if putative candidate in this snoRNA class, -1 otherwise 3 => Prediction probability 4 => start position in sequence 5 => end position in sequence 6 => name of sequence (usually header information)