bbq is a powerful command-line tool for discovering clustered occurences of transcription factor binding sites in genomes. This is done in a multiple-alignment-like fashion by solving a certain geometric optimization problem, the so-called best barbeque problem (explaining the name
The implementation provided here supports many practically relevant features, such as weighted sets, p-value based weighting schemes, computing the best
h solutions instead only the best solution, and several others. Most of these features are supported by two algorithms representing two different approaches to solving the given optimization problem.
The main goal of this implementation was to achieve best possible running times - together with a demand for numerous features (and combinations of features), which meant finding the right balance between an extendable and maintainable design and high performance. Templates sometimes lessened the pain of finding the right balance - anyway, anybody who makes it up to this point in the source code should keep in mind that this implementation inevitably is the result of a compromise. Whoever modifies this, should bear in mind to keep it a high-performance implementation.
A few words should be spent on the class design. All routines for scanning command line parameters and reading input files are contained in
main.cpp in the global namespace. The functions in
main.cpp create a suitable instance of the template class
weighted_footprint_detector implements all crucial parts of the underlying algorithms, including string matching, computation of interval arrangements and finding best intersections of cells (based on algorithm (A1) or (A2)). Class
weighted_footprint_detector requires handling (weighted) sets or multisets. For the different versions - weighted/unweighted and sets/multisets - different classes can be passed as a template parameter to
weighted_variable_bitset for weighted bitsets of arbitrary cardinality,
variable_bitset for arbitrary length unweighted bitsets,
uint64_bitset for unweighted bitsets of limited cardinality (i.e., 32 or 64 elements),
bit_multi_set for unweighted multisets and
weighted_bit_multi_set for weighted multisets.
Due to the performance oriented design, the interface for set-handling classes is somewhat more complex than might be neccessary in a minimum-effort implementation. In case of implementing your own set-handling class, refer to the existing ones. Note that generic programming allows for flexible algorithms essentially without losing efficiency.
All classes are contained in the namespace
A rather historical version of
bbq - demonstrating the actual idea behind the algorithms - can be found in the class