BBQ with Tanimoto Scores - Supplementary Materials


This website provides accompanying materials for my diploma thesis. It is all about best BBQs. Sounds nice, hu? ;-)

Muscle Genes

Genomic Sequences

The 46 muscle-specific sequences were taken from this website by Prof. Wasserman's group.

Position Count Matrices

Five muscle-specific PCMs were taken from the original publication by Wasserman et al., accompanied by additionally 28 matrices belonging to Homo sapiens which were retrieved from JASPAR. Altogether 33 matrices were used for the motif detection.

ACTB Genes

Genomic Sequences

We used two sequence sets here. The first set contains 11 upstream sequences with an average length of 342nt. For evaluating the stability of the algorithm we extended these sequences in length were possible, which results in five sequences with an average length of 5174nt.

Artificial Data Sets

We created 3 different artificial data sets. The first basic data set consists of five different modules containing one to six binding sites. The more complex example contains seven sequences and 30 different binding site motifs. The tarballs containg the complete bundle of the artificial sequences as well as the PCMs which are derived from the artificial binding site alignments.