SnoReport 1.0
 =============
 by Jana Hertel:
 jana@tbi.univie.ac.at


 Outline:
 --------
	1) Introduction
	2) Installation
	3) Howto
	4) Summarizing tool

1) Introduction
---------------

SnoReport is a new tool to detect snoRNA genes without using information on
their targets. Based on a single sequence - given by the user - the program
checks whether the sequence might represent an H/ACA snoRNA or a C/D snoRNA
or neither of both.

In general, the program first searches for the class-specific boxes C and D
or H and ACA, respectively. The sequence is then folded, preventing the
boxes from building base pairs. The resulting structure is analysed and
characteristic features are extracted and summarized in a numerical vector.

The vector is analysed by an included Support Vector Machine and
specifically trained models. The SVM then returns a probability for the
input to belong to one of the classes.

Probabilities higher than 0.5 are assigned as positive candidates. If it
happens that the input is positiv for C/D and H/ACA snoRNA, one has to
estimate the difference of both probabilities and make an own decision.

2) Installation
---------------

SnoReport needs the Vienna RNA Package installed on your computer. So make sure the 
package is correctly installed and if necessary adjust the paths in the makefile, 
such that the Vienna RNA library and the include files can be found.

Download SnoReport from:

http://www.tbi.univie.ac.at/~jana/software/SnoReport1.0.tgz

to a directory on your computer.

Extract the files from the archive:

$ tar xzvf SnoReport_1.0.tgz

To install the program type:

$ ./install.sh

Before you can use the program you need to set the environment variable
SNOREPORT, pointing to the 'models/' directory of your installation
directory.

bash:

$ export SNOREPORT=/your/installation/directory/models

tcsh:

$ setenv SNOREPORT "/your/installation/directory/models"

now. have fun!

3) Howto
--------

SnoReport reads your sequences from stdin. So you can either direct your
file to the program:

./snoReport < yourSequences.fa

or you can start the program with your chosen options:

./snoReport -r

then you will be asked to paste your sequences to the prompt.

INPUT:     The sequences need to be in FASTA format.

IMPORTANT: Please substitute space characters in the headers of you
           sequences by other non-space characters to avoid loosing
	   important information!

Each sequence is analysed for being a C/D or an H/ACA snoRNA


Usage: snoReport [-m PATH] [-r|-h|-v] < FILE

FILE     file in FASTA format >= 1sequence
PATH     points to location of the 'models' directory of
         your SnoReport installation
-r       Classify reverse complement of input data
-h       shows this help message
-v       shows version information

4) Summarizing tool
-------------------

To summarize your output of SnoReport you can use the perl program:

'summarizeOutput.pl'.

It simply takes an output file of SnoReport and returns a list of the best
non-overlapping candidates that were predicted by SnoReport of your
sequence(s).

Calling the script without options gives you this usage instructions:

usage:   perl summarizeOutput.pl -in FILE [-out FILE | -posOnly BOOL]
  
  version: June 2007
  
  options: -in       Path to output file of your SnoReport run
                     
           -out      Path to output file of ./summarizeOutput.pl.
                     Default: STDOUT
           -posOnly  Set to 1 if only positive candidates should be
                     analysed, 0 for both. Default: 0
                     
  purpose: Extracts the best non-overlapping candidates predicted
           by the SnoReport program (version 1.0).
           
  results: List of the best (prediction probability) candidates
           that do not overlap.
           Output contains 6 columns:
              1 => snoRNA class
              2 => 1 if putative candidate in this snoRNA class, -1 otherwise
              3 => Prediction probability
              4 => start position in sequence
              5 => end position in sequence
              6 => name of sequence (usually header information)