next up previous
Next: About this document ... Up: SnoReport: Computational identification of Previous: SnoReport: Computational identification of

Supplemental Material

The program is online available here.
The following files contain the test sequences and multiple sequence alignments that were used for training the SVM models in the snoReport program.

Sequences in FASTA format used for training and testing the single sequence SVM models:

Alignments used for training and testing the SVM models for multiple sequence alignment classification, in CLUSTAL W format, stored in tar archieves

To estimate the accuracy of our extracted SVM descriptors we divided the data into 3 different test sets, with a randomly selected subset used for training and the remaining data for testing. Sensitivity and specificity on the test set for the single sequence and multiple alignment case are given in the table below. For single sequence predictions positive examples were split into a training set containing 80% of the data, the remaining 20% were used for testing. In the multiple alignment case the test set comprised 10% of the positive examples.


Table: The models were created with svm-type C-SVC and radial basis function kernel. The parameter combination of $\mathcal{C}$ and $\gamma$ for those settings has been estimated using the grid.py script that comes with the libsvm package. This estimation is done using uses cross validation (CV) and accuracy refers to this CV. Sensitivity and Specificity refer to our test data.
  C/D H/ACA
  sgl. mult. sgl. mult.
$\mathcal{C}$ 32768 8.00 32.00 128.00
$\gamma$ 0.5 0.50 2.00 0.03
accuracy 0.97 0.99 0.93 0.99
sensitivity 0.65 0.92 0.82 0.98
specificity 0.98 0.99 0.96 0.99




next up previous
Next: About this document ... Up: SnoReport: Computational identification of Previous: SnoReport: Computational identification of
Jana Hertel 2007-02-03