weighted_footprint_detector
implements most crucial parts of the algorithms underlying bbq
, including computation of the interval arrangements and finding best intersections between cells.
More...
#include <weighted_footprint_detector.h>
Public Types | |
typedef std::set< w_bit_set > | arrangement |
Procedure enumerate_differences needs fast finding of elements in the interval arrangements, which is achieved by this type of an stl multiset . | |
typedef std::pair< typename arrangement::iterator, typename arrangement::iterator > | arrangement_range |
typedef std::set< w_bit_set > | cell_set |
Public Member Functions | |
weighted_footprint_detector () | |
default constructor. | |
weighted_footprint_detector (int, int, char **, char **, char **, char **, int *, int *, int *) | |
~weighted_footprint_detector () | |
destructor. | |
void | clear () |
free all memory used. | |
void | init_arrangement_cells () |
based on the occurences of the fragments s in the genomes T, compute the cells C[i] of the interval arrangement induced by the occurences in each T[i]. | |
bool | bound (const w_bit_set &M, const unsigned int &j) |
bound function for branch-and-bound procedure enumerate_subsets (Algorithm (A2)) | |
bool | w_bound (const w_bit_set &M, const unsigned int &j) |
bound function for the weighted version of the branch-and-bound procedure enumerate_subsets (Algorithm (A2)) | |
bool | h_best_w_bound (const w_bit_set &M, const unsigned int &j) |
bound function for the h-best-match and weighted version of the branch-and-bound procedure enumerate_subsets (Algorithm (A2)). | |
void | enum_w_sets (w_bit_set M, int j) |
needed for the weighted version of the branch-and-bound-version of Algorithm (A1) implemented in enumerate_subsets() | |
void | enum_h_best_w_sets (w_bit_set M, int j) |
weighted version for computing the h best solutions using Algorithm (A2). | |
void | enum_h_best_sets (w_bit_set M, int j) |
unweighted version of computing the h best solutions using Algorithm (A2). | |
void | enum_sets (w_bit_set M, int j) |
needed for the branch-and-bound-version of Algorithm (A1) implemented in enumerate_subsets() | |
void | enumerate_subsets () |
implements the branch-and-bound-version of Algorithm (A2) | |
void | enumerate_h_best_subsets () |
implements the branch-and-bound-version of Algorithm (A2) for finding the h best matches. | |
void | compute_intersections () |
implements the branch-and-bound-version of Algorithm (A1) | |
void | update_h_best_list (const float &wght, int *nu, w_bit_set &intersection) |
if a new hit was found that might belong to the list of the h best solutions, this procedure is called by compute_h_best_intersections() . | |
void | compute_h_best_intersections () |
compute the h best hits using the branch-and-bound-version of algorithm (A1) | |
w_bit_set | difference (w_bit_set M, w_bit_set &D) |
return M setminus D | |
bool | test_diff_set (w_bit_set &M) |
test whether M setminus D is a suitable barbque. | |
void | get_element_array (w_bit_set M, int *E, int &l) |
return l=M.cardinality(), with E[0],...E[l-1] containing the l elements contained in M. | |
void | enumerate_differences (int delta) |
compute a best barbeque with delta-bounded difference based on a parameterized version of the bbq problem. | |
void | get_frag_occurences (int i, int j, match_list &) |
compute the occurences of s[j] in T[i], and produce match_list as a result. | |
void | get_occurences (int i, int j) |
compute the occurences of s[j] in T[i] | |
void | get_group_occurences (int i, int j) |
compute the occurences of the fragment in group j in T[i] | |
void | run (int, float) |
start the matching process | |
void | reverse_compl () |
compute the reverse complement of the patterns, so that in effect | |
float | get_result () |
return the best solution found so far. | |
void | print_h_results () |
print the best h solution found so far into stdout. | |
float | print_single_result (int *nu, w_bit_set &intersection, const float &weight) |
print the solution specified by intersection into stdout. | |
float | print_result () |
print the result of the whole alignment procedure into stdout. | |
void | print_postscript (std::string filename) |
print the result of the whole alignment procedure into a postscript file named filename . | |
void | print_ps_labels (std::string filename) |
print labels of the current instance into a postscript file named filename . | |
void | print_ps_h_results (std::string &) |
print the best h results of the whole alignment procedure into a postscript file named filename . | |
void | print_ps_single_result (int *nu, std::ostream &o, w_bit_set &intersection) |
print the solution specified by nu into a postscript file associated with o | |
void | print_with_names (w_bit_set &) |
print the names of the sequences (or groups) contained in the argument passed. | |
void | set_number_of_hits (int) |
set the number h of hits to be computed, i.e., the best h (substantially different) solutions are computed. | |
void | set_weighted (bool) |
switch weighted intersections on and off. | |
void | set_count_results (bool) |
switch between couting and printing results for thresholded instances. | |
void | set_weight_type (int wt) |
choose a weighting scheme. | |
void | set_reverse (bool) |
if set to true, the fragments s [k/2],...,s[k-1] are regarded as fragments on the 5' strand, while s [1],...,s[k/2-1] are regarded as fragments on the 3' strand. | |
void | set_postscript (bool) |
switch weighted intersections on and off. | |
void | set_ps_filename (std::string) |
switch weighted intersections on and off. | |
void | set_progress_report (bool) |
switch printing of progress report during execution. | |
void | set_algorithm (int A) |
Choose the algorithm to be applied. | |
void | set_grouping (bool) |
switch grouping of fragments on and off. | |
void | set_group_characters (int) |
switch number of leading chars used for grouping of fragments on and off. | |
void | get_groups () |
compute group names form fragment names as well as the corresponding mapping group_indices that assigns a set of fragment index to each group index. | |
void | set_output_detail (bool) |
pass true if more detailled output is desired. | |
void | set_delta (int) |
set the bound for enumerate_differences() . | |
bool | increase (int *nu, int I) |
increase the index vector nu at position I. | |
bool | differs (int *, int *) |
When calling compute_h_best_intersections(), we want to obtain substantially different solutions. | |
double | p_value (int i, int j, int nu) |
Compute a first approximation to a p-value like measure of significance for the nu-th occurence of s[j] in T[i]. | |
void | get_freq () |
We count neucleotide frequencies and dinucleotide frequencies in the genome, so that we can determine (approximations of) p- and E-values later on. | |
float | get_weight (int i, int j, int nu) |
compute a weight for the nu-th occurence of s [j] in T [i] (with position p [i][j][nu] ) | |
void | compute_index_table () |
do some preprocessing for p_value() and get_freq() |
weighted_footprint_detector
implements most crucial parts of the algorithms underlying bbq
, including computation of the interval arrangements and finding best intersections between cells.
It takes a class w_bit_set
that implements all set operations needed by weighted_footprint_detector
. Different classes provided for w_bit_set
implement weighted/unweighted sets or multisets with different contraints on the cardinality of the sets (and hence on the number of motifs that bbq
can handle).
|
Procedure
|
|
|
|
|
|
default constructor.
|
|
|
|
destructor.
|
|
bound function for branch-and-bound procedure enumerate_subsets (Algorithm (A2))
|
|
free all memory used.
|
|
compute the h best hits using the branch-and-bound-version of algorithm (A1)
|
|
do some preprocessing for by default, we want to use asymmetric match tables, i.e., wildcard Ns are only wildcards if they occur in a pattern fragment, but not in the text. |
|
implements the branch-and-bound-version of Algorithm (A1)
|
|
return
|
|
When calling compute_h_best_intersections(), we want to obtain substantially different solutions.
The procedure |
|
unweighted version of computing the h best solutions using Algorithm (A2).
Uses a difference policy in the same way as |
|
weighted version for computing the h best solutions using Algorithm (A2).
Uses a difference policy in the same way as |
|
needed for the branch-and-bound-version of Algorithm (A1) implemented in
|
|
needed for the weighted version of the branch-and-bound-version of Algorithm (A1) implemented in
|
|
compute a best barbeque with delta-bounded difference based on a parameterized version of the bbq problem.
|
|
implements the branch-and-bound-version of Algorithm (A2) for finding the
|
|
implements the branch-and-bound-version of Algorithm (A2)
|
|
return l=M.cardinality(), with E[0],...E[l-1] containing the l elements contained in M.
|
|
compute the occurences of s[j] in T[i], and produce ATTENTION: DYNAMIC PROGRAMMING HAS BEEN DISABLED IN THE FOLLOWING TWO LINES! MISMATCHES INSTEAD OF EDIT-DISTANCE ARE USED; TO SWITCH BACK TO EDIT DISTANCE, REPLACE INT_MAX BY THE COMMENTED CODE. |
|
We count neucleotide frequencies and dinucleotide frequencies in the genome, so that we can determine (approximations of) p- and E-values later on.
|
|
compute the occurences of the fragment in group j in T[i]
|
|
compute group names form fragment names as well as the corresponding mapping group_indices that assigns a set of fragment index to each group index.
|
|
compute the occurences of s[j] in T[i]
|
|
return the best solution found so far.
|
|
compute a weight for the nu-th occurence of
|
|
bound function for the h-best-match and weighted version of the branch-and-bound procedure enumerate_subsets (Algorithm (A2)).
Uses a difference policy in the same way as |
|
increase the index vector nu at position I.
If nu[I]>Lambda[I], then an overflow is handled, i.e., nu[I+1] is increased. This procedure is needed by |
|
based on the occurences of the fragments s in the genomes T, compute the cells C[i] of the interval arrangement induced by the occurences in each T[i]. std::cout<<"*** "<<i<<"\n"; |
|
Compute a first approximation to a p-value like measure of significance for the nu-th occurence of s[j] in T[i]. The log of this p-value is used for an advanced weighting scheme. |
|
print the best h solution found so far into stdout.
|
|
print the result of the whole alignment procedure into a postscript file named
|
|
print the best h results of the whole alignment procedure into a postscript file named
|
|
print labels of the current instance into a postscript file named
|
|
print the solution specified by
|
|
print the result of the whole alignment procedure into stdout.
|
|
print the solution specified by
|
|
print the names of the sequences (or groups) contained in the argument passed.
|
|
compute the reverse complement of the patterns, so that in effect
|
|
start the matching process
|
|
Choose the algorithm to be applied. A=1 chooses (A1) (default), A=2 chooses (A2). All other values are ignored and leave the current method unchanged. |
|
switch between couting and printing results for thresholded instances.
|
|
set the bound for
If |
|
switch number of leading chars used for grouping of fragments on and off.
|
|
switch grouping of fragments on and off.
|
|
set the number h of hits to be computed, i.e., the best h (substantially different) solutions are computed.
|
|
pass true if more detailled output is desired.
|
|
switch weighted intersections on and off.
|
|
switch printing of progress report during execution.
|
|
switch weighted intersections on and off.
|
|
if set to true, the fragments
|
|
choose a weighting scheme.
See |
|
switch weighted intersections on and off.
|
|
test whether M setminus D is a suitable barbque.
|
|
if a new hit was found that might belong to the list of the h best solutions, this procedure is called by If the new candidate is sufficiently different from the h best solutions found so far, the list of the h best solutions is updated by the new solution correspondingly. |
|
bound function for the weighted version of the branch-and-bound procedure enumerate_subsets (Algorithm (A2))
|