GroupIndex
is a generic programming libray for building content based indexing systems/ GroupIndex
relies in a large set of C++ template classes, and basic knowledge and experience about C++ templates is required for using GroupIndex
.GroupIndex
. As the names suggest, instances of group and set classes represent elements of a groups and sets, respectively, in the mathematical sense. An indexing system built on top of GroupIndex
consists of two main ingredients: a group class - implementing a transformation group - and a set class - implementing a class of objects that the transformation group acts on. Specific requirements for these classes and their mathematical background are discussed in the next section.GroupIndex
is to implement mathematical structures in C++ classes. In fact, this is quite a familiar concept from the standard template library (em stl): Using a generic stl sorting algorithm requires the programmer to implement a suitable comparison between objects. In mathematical terms, this comparison needs to satisfy the properties of a strict weak ordering, which is usually implemented by overloading a suitable operator<()
. Note that a correct outcome of the generic sorting algorithm first of all has a syntactic requirement - i.e., providing a comparison between objects, and, sendly and even more importantly, requires the implemented comparison to satisfy certain requirements, namely those of a strict weak ordering in the mathematical sense. The same idea is used in the GroupIndex
template library. The only difference is that the mathematical structures - namely groups, sets and group actions - require a little bit more effort of implementation.
As a first example, we start with discussing the most important structure for GroupIndex
, namely groups, and their implementation. Recall that a group is a set G equipped with a mapping * : G x G -> G, where G and * satisfy certain axioms: for all a, b und c in G, we have
Group
that we will refer to as the group class. The idea is that an instance of Group
corresponds to an element of G. Instead of saying "let g denote an element in G", we write a C++ statement
Group g;
The place of the mapping * in G is taken by a C++-operator:
Group operator*(const Group& a, const Group& b);
The inverse element belonging to a group element g - typically written as g^
(-1) - can also be expressed quite naturally in terms of an operator:
Group& operator!(Group& g);
The neutral element can be implemented on the basis of an assignment operator:
Group& Group::operator=(int i) { if (i==1) ...; }
so that the neutral element of a group G
implemented by Group
is obtained by writing
Group g; g = 1;
Note that the other parts of the group axioms (G1)-(G3) cannot be stated in terms of C++ formalisms - they need to be satisfied by a correct implementation, so that, for instance,
Group g,h; h = g * !g;
Group h; h = 1;
GroupIndex
is based on implementations of groups very much the same way as described in the previous section. However, due to issues concerning efficiency, group classes deviate in few details from the more idealistic way from the previous section:
Group::operator*=
(const Group&
g
) rather than operator*()
this
object and returns the this
- Objekt as the result. Inversion will not create a new object containing the inverse. For example, the result of the following lines of code will be that g
and h
contain the same group element:Group g,h; ... h = !g;
In order to achieve g
containing the inverse of h
, the following code will do:
Group g,h; ... h = g; !h;
Since the scenario underlying GroupIndex
is that a gruop acts on a set, we need a way to implement group actions in C++. We do so by overloading operator*=
:
Set& Group::operator*=(Set& s) const;
Group g; Set s_1, s_2; ... s_2 = s_1; g *= s_2;
Finally, GroupIndex
requires the computation of representatives, i.e., given an arbitrary set element s
, we need to find a unique element r from the G -orbit of s
such that s=g*r
for some group element g
.
This is achieved by implementing a function
void getRepresentation(const Set& m, Group& g_m, Set& r_m);
which needs to be visible in the global namespace. Given m
, getRerpresentation
computes g_m
and r_m
such that
m == (g_m *= r_m)
where r_m
is a unique representative of the G-orbit of M:
m = g_m r_m
For algebraic details, we refer to the corresponding references on G-inverted lists.
In addition to the mathematical requirements, there are some technical requirements for group and set classes that need to be implemented:
operator<<()
and operator>>()
for reading from and writing to files operator==()
and operator!=()
(see the following remarks on comparison operators) bool operator<(const Group& g, const Group& h)
int Group::size();
operator<<()
writes to an ofstream
. If all group elements require the same number of bytes of disc space, this method can be implemented as static
.
Remarks on operator==()
und operator!=()
For set classes, there is a specific reason that both operator==()
operator!=()
need to implemented - they are used in different situations:operator==()
is applied during the indexing phase and checks for exact equality. In contrast, operator!=()
is used during computing query results. An operator!=
may return false
if the objects under comparison are similar in some way, but not exactly equal. This allows a certain amount of fault tolerant queries. However, not that certain important properties used for query processing such as assoziativity of set intersections might get lost. If fault tolerance is not an issue, the straight forward way of implementinf operator!=()
is to write
bool myGroup::operator!=(const myGroup& cmp) { return !((*this)==g2); }
It remains to introduce the technical requirements, which are almost identical to those of group classes:
operator<<()
and operator>>()
for reading from and writing to files bool operator<(const Set& s, const Set& t)
getRepresentation()
. int Group::size();
operator<<()
writes to an ofstream
. If all set elements require the same number of bytes of disc space, this method can be implemented as static
.friend
of the corresponding group class.AbstractGroup
and AbstractSet
for the formal requirements discussed above. In theory, users can derive from these classes and implement the virtual
methods for building indices using GroupIndex
. In practice, however, this is not a good advice. For efficiceny reasons, group and set classes ahould be built from scratch, so that there is no loss due to virtual method calls. In addition, declaring operators a inline
might improve performance additionally.
class AbstractGroup { public: virtual bool operator==(Group&); virtual bool operator!=(Group&); virtual bool operator<(Group&); friend std::ofstream& operator<<(std::ofstream&, const Group&); friend std::ifstream& operator>>(std::ifstream&, const Group&); virtual Group(); virtual Group(const Group&); virtual ~Group(); virtual int size(); virtual Group& operator!(); virtual Group& operator*=(Group&); virtual Set& operator*=(Set&); };
class AbstractSet { friend class AbstractGroup; public: virtual bool operator==(Set&); virtual bool operator<(Set&); friend std::ofstream& operator<<(std::ofstream&, const Set&); friend std::ifstream& operator>>(std::ifstream&, const Set&); virtual Set(); virtual Set(const Set&); virtual ~Set(); virtual int size(); };
GroupIndex
-based indexing system: groupindex::IntervalIntersectionGList
).
When implementinf getRepresentation()
, special care needs to be taken for choosing a suitable container class for storing (fuzzy) representatives and stabilizers, which will be discussed in the next section.
GroupIndex
needs to store group elements and set elements in containers of different type. Suitable container classes can be chosen by passing them as (optional) template parameters to GroupIndex
. Principally, there are two types of containers, namely group containers and set containers, for storing collections of group elements and set elements, respectively. Containers for handling collecitons of elements rather than single elements are required in the following settings:
GrouIndex
supplies the class
If dealing with non-regular group actions, the function getRepresentation
needs to be adapted in the following way:
void getRepresentation(const Set& m, std::list<Group>& g_m, Set& r_m);
where Set
is a nonregular_group_container
.
getRepresentation
computes a whole set of (neighboured) set elements rather than a single one; the set elements are typically elements representing neighboured orbits. A container class with suitable properties is provided by c GroupIndex:
Fuzzy searching also requires an adaptation of getRepresentation
:
void getRepresentation(const Set& m, Group& g_m, std::list<Set>& r_m);
getRepresentation
takes the following parameters: void getRepresentation(const Set& m, std::list<Group>& g_m, std::list<Set>& r_m);
groupindex::DefaultRepresentation
, a bunch of classes is available for handling inverted lists:
groupindex::GInvertedList<Group>
for exact intersections groupindex::IntervalIntersectionGList<Group, numeric, Dim>
for fuzzy intersections groupindex::CompressedGInvertedList<Group>
for exact intersections with golomb-code based list compressiongroupindex::IntervalIntersectionGList
requires the group clas to implement an index-Operator operator
[] with return type Group::numeric
so that group elements can be considered as elements of a Euclidean vector space, whose dimension is given by Group::Dim
; the syntax for the index operator is numeric Group::operator[](int)
so that each group element g
yields one numeric value for g
[0],g[1],...g[Dim-1]. The class Group::numeric
mus support full arithmetic, i.e. addition, subtraction, multiplication and division. In most cases, double
or float
will do, while in some cases (whenever numeric stability is of concern), type real
from the Leda library or C++ classes provided in the GNUmp
library will be helpful.
integershift.cpp
contained in the GroupIndex
installation, which consits of the group class IntShift
and the set class Int
; for more elaborate example, we refer to TRES
or IBIS
. In the following declarations, we suppose we are given a group class myGroup
and a set class mySet
with a corresponding imlpementation of getRepresentation()
.
All typedefs
except for myIndex
are only helpers making the declarations more readeable. For generating indices and starting queries, only myIndex
will be neccessary.
typedef GInvertedList < myGroup > myGList; typedef DefaultRepresentation < mySet, myGroup, myGList > myRepresentation; typedef GroupIndex < mySet, myGroup, myRepresentation, myGList > myIndex;
typedef IntervalIntersectionGList < myGroup , 7 , double > myGList; typedef DefaultRepresentation < mySet, myGroup, myGList > myRepresentation; typedef GroupIndex < mySet, myGroup, myRepresentation, myGList > myIndex;
typedef nonregular_group_container < myGroup > my_group_container; typedef default_set_container < mySet > my_set_container; typedef IntervalIntersectionGList < myGroup , 7 , double > myGList; typedef DefaultRepresentation < mySet, myGroup, myGList, my_group_container, my_set_container > myRepresentation; typedef GroupIndex < mySet, myGroup, myRepresentation, myGList, my_group_container, my_set_container > myIndex;
typedef nonregular_group_container < myGroup > my_group_container; typedef fuzzy_set_container < mySet > my_set_container; typedef IntervalIntersectionGList < myGroup , 7 , double > myGList; typedef DefaultRepresentation < mySet, myGroup, myGList, my_group_container, my_set_container > myRepresentation; typedef GroupIndex < mySet, myGroup, myRepresentation, myGList, my_group_container, my_set_container > myIndex;
IndexCommandLine
produces a simple text based interface that allows users of the indexing system to build indices and start queries using simple commands such as ADD, REMOVE or SEARCH. A more detailled documentation (which currently is available in German only) is given in the documentaiton of groupindex::IndexCommandLine
.