The GroupIndex Template Library
Main Page | Class Hierarchy | Class List | File List | Class Members

The GroupIndex - Template-Library

I. Introduction

The Indexing System GroupIndex is a generic programming libray for building content based indexing systems/ GroupIndex relies in a large set of C++ template classes, and basic knowledge and experience about C++ templates is required for using GroupIndex.

II. Group Classes and Set Classes

So called group classes and set classes take center stage in GroupIndex. As the names suggest, instances of group and set classes represent elements of a groups and sets, respectively, in the mathematical sense. An indexing system built on top of GroupIndex consists of two main ingredients: a group class - implementing a transformation group - and a set class - implementing a class of objects that the transformation group acts on. Specific requirements for these classes and their mathematical background are discussed in the next section.

II.1 From Algebraic Structures to C++ Classes

An important basic idea behind GroupIndex is to implement mathematical structures in C++ classes. In fact, this is quite a familiar concept from the standard template library (em stl): Using a generic stl sorting algorithm requires the programmer to implement a suitable comparison between objects. In mathematical terms, this comparison needs to satisfy the properties of a strict weak ordering, which is usually implemented by overloading a suitable operator<(). Note that a correct outcome of the generic sorting algorithm first of all has a syntactic requirement - i.e., providing a comparison between objects, and, sendly and even more importantly, requires the implemented comparison to satisfy certain requirements, namely those of a strict weak ordering in the mathematical sense. The same idea is used in the GroupIndex template library. The only difference is that the mathematical structures - namely groups, sets and group actions - require a little bit more effort of implementation.

As a first example, we start with discussing the most important structure for GroupIndex, namely groups, and their implementation. Recall that a group is a set G equipped with a mapping * : G x G -> G, where G and * satisfy certain axioms: for all a, b und c in G, we have

We now carry these axioms into C++ classes. In place of a group, we now consider a class Group that we will refer to as the group class. The idea is that an instance of Group corresponds to an element of G. Instead of saying "let g denote an element in G", we write a C++ statement

 Group g;

The place of the mapping * in G is taken by a C++-operator:

 Group operator*(const Group& a, const Group& b);

The inverse element belonging to a group element g - typically written as g^(-1) - can also be expressed quite naturally in terms of an operator:

 Group& operator!(Group& g);

The neutral element can be implemented on the basis of an assignment operator:

 Group& Group::operator=(int i)
 {
   if (i==1)
       ...;
 }

so that the neutral element of a group G implemented by Group is obtained by writing

 Group g; g = 1; 

Note that the other parts of the group axioms (G1)-(G3) cannot be stated in terms of C++ formalisms - they need to be satisfied by a correct implementation, so that, for instance,

Group g,h;
h = g * !g;
has the same effect as writing
Group h;
h = 1;

II.2 Group Classes

GroupIndex is based on implementations of groups very much the same way as described in the previous section. However, due to issues concerning efficiency, group classes deviate in few details from the more idealistic way from the previous section:

Group g,h;
...
h = !g;

In order to achieve g containing the inverse of h, the following code will do:

  Group g,h;
  ...
  h = g;
  !h;

Since the scenario underlying GroupIndex is that a gruop acts on a set, we need a way to implement group actions in C++. We do so by overloading operator*=:

 Set& Group::operator*=(Set& s) const;
Now, the lines
 Group g;
 Set s_1, s_2;
 ...
 s_2 = s_1;
 g *= s_2;
mathematically read that s_2 = g*s_1, i.e., group element g acts on set element s_1.

Finally, GroupIndex requires the computation of representatives, i.e., given an arbitrary set element s, we need to find a unique element r from the G -orbit of s such that s=g*r for some group element g.

This is achieved by implementing a function

 void getRepresentation(const Set& m, Group& g_m, Set& r_m);

which needs to be visible in the global namespace. Given m, getRerpresentation computes g_m and r_m such that

 m == (g_m *= r_m)

where r_m is a unique representative of the G-orbit of M:

m = g_m r_m

For algebraic details, we refer to the corresponding references on G-inverted lists.

In addition to the mathematical requirements, there are some technical requirements for group and set classes that need to be implemented:

A complete declaration of an abstract group class will be given later. First of all, we discuss some details of above requirements.

Remarks on operator==() und operator!=() For set classes, there is a specific reason that both operator==() operator!=() need to implemented - they are used in different situations:operator==() is applied during the indexing phase and checks for exact equality. In contrast, operator!=() is used during computing query results. An operator!= may return false if the objects under comparison are similar in some way, but not exactly equal. This allows a certain amount of fault tolerant queries. However, not that certain important properties used for query processing such as assoziativity of set intersections might get lost. If fault tolerance is not an issue, the straight forward way of implementinf operator!=() is to write

bool myGroup::operator!=(const myGroup& cmp)
{
  return !((*this)==g2);
}

II.3 Set classes

Instances of a set do not need to satisfy any contraints in the mathematical sense - group actions and computing representatives is already achieved outside the set class.

It remains to introduce the technical requirements, which are almost identical to those of group classes:

In most cases, it makes sense to implemented the set class as a friend of the corresponding group class.

II.4 Abstract classes for group and set classes

We now provide abstract interfaces AbstractGroup and AbstractSet for the formal requirements discussed above. In theory, users can derive from these classes and implement the virtual methods for building indices using GroupIndex. In practice, however, this is not a good advice. For efficiceny reasons, group and set classes ahould be built from scratch, so that there is no loss due to virtual method calls. In addition, declaring operators a inline might improve performance additionally.

 class AbstractGroup
 {
 public:

    virtual bool operator==(Group&);
    virtual bool operator!=(Group&);
    virtual bool operator<(Group&);

    friend std::ofstream& operator<<(std::ofstream&, const Group&);
    friend std::ifstream& operator>>(std::ifstream&, const Group&);

    virtual Group();
    virtual Group(const Group&);
    virtual ~Group();

    virtual int size();
        
    virtual Group& operator!();
    virtual Group& operator*=(Group&);
    virtual Set& operator*=(Set&);
      
 };

 class AbstractSet
 {

 friend class AbstractGroup;

 public:

    virtual bool operator==(Set&);
    virtual bool operator<(Set&);

    friend std::ofstream& operator<<(std::ofstream&, const Set&);
    friend std::ifstream& operator>>(std::ifstream&, const Set&);

    virtual Set();
    virtual Set(const Set&);
    virtual ~Set();

    virtual int size();
        
 };

III. Implementing Indexing Systems

Essentially, four steps are neccessary for setting up a GroupIndex -based indexing system:
  1. Identify the group action underlying your index application
  2. Implement corresponding group and set classes
  3. Select suitable container classes
  4. Select suitable classes for represenetation systems and inverted lists

III.1 Identifying group action

A complete discussion is beyond the scope of this document; we refer to the accomanying publications.

III.2 Implementing Group and Set Classes

Having identified the group action with the corresponding group and set, precisely two classes need to be implemented -- namely a set class and a group class, as discussed in the rpevious section. If the set class is not of discrete nature, one might want to identify a suitable discretization (by rounding coordinates, for instance). If the group does not have a discrete structure, well-behaving discretizations are usually hard to find; this should be taken into account in the final steps when choosing suitable structures for inverted lists, eventually creating the demand of an index operator (see the documentation of groupindex::IntervalIntersectionGList).

When implementinf getRepresentation(), special care needs to be taken for choosing a suitable container class for storing (fuzzy) representatives and stabilizers, which will be discussed in the next section.

III.3 Choosing Container Classes

Depending on the properties required for an indexing system, GroupIndex needs to store group elements and set elements in containers of different type. Suitable container classes can be chosen by passing them as (optional) template parameters to GroupIndex. Principally, there are two types of containers, namely group containers and set containers, for storing collections of group elements and set elements, respectively. Containers for handling collecitons of elements rather than single elements are required in the following settings:

 groupindex::nonregular_group_container<Group, GList>

If dealing with non-regular group actions, the function getRepresentation needs to be adapted in the following way:

 void getRepresentation(const Set& m, std::list<Group>& g_m, Set& r_m);

where Set is a nonregular_group_container.

 groupindex::fuzzy_set_container<Group, GList>

Fuzzy searching also requires an adaptation of getRepresentation :

 void getRepresentation(const Set& m, Group& g_m, std::list<Set>& r_m);

III.4 Selecting Suitable Classes for Represenetation Systems and Inverted Lists

Depending on properties of the group action and requirements for the indexing system to be set up (such as fault tolerance), different classes for handling inverted lists and storing representatives of set elements can be chosen. While for handling set elements, there is only one class available at the moment, namely groupindex::DefaultRepresentation, a bunch of classes is available for handling inverted lists:

All list types support fuzzy-search and allow dynamic programming based intersections with mismatches. groupindex::IntervalIntersectionGList requires the group clas to implement an index-Operator operator[] with return type Group::numeric so that group elements can be considered as elements of a Euclidean vector space, whose dimension is given by Group::Dim; the syntax for the index operator is
 numeric Group::operator[](int)

so that each group element g yields one numeric value for g[0],g[1],...g[Dim-1]. The class Group::numeric mus support full arithmetic, i.e. addition, subtraction, multiplication and division. In most cases, double or float will do, while in some cases (whenever numeric stability is of concern), type real from the Leda library or C++ classes provided in the GNUmp library will be helpful.

IV. Examples

As the final part of this manual, we provide some examples of choosing suitable container classes and list classes for implementing indexing systems. We do not provide details of implementing group and set classes. An example implementation of an indexing system can be found in integershift.cpp contained in the GroupIndex installation, which consits of the group class IntShift and the set class Int; for more elaborate example, we refer to TRES or IBIS. In the following declarations, we suppose we are given a group class myGroup and a set class mySet with a corresponding imlpementation of getRepresentation().

All typedefs except for myIndex are only helpers making the declarations more readeable. For generating indices and starting queries, only myIndex will be neccessary.

IV.1 Example 1: Regular group action without fuzzy searching and with exact intersections:

 typedef GInvertedList < myGroup  > myGList;
 typedef DefaultRepresentation < 
            mySet, 
            myGroup, 
            myGList 
         > myRepresentation;
 typedef GroupIndex <
            mySet, 
            myGroup, 
            myRepresentation, 
            myGList 
         > myIndex;

IV.2 Example 2: Regular group action without fuzzy searching and with fuzzy intersections:

 typedef IntervalIntersectionGList < myGroup , 7 , double > myGList;
 typedef DefaultRepresentation <
            mySet, 
            myGroup, 
            myGList 
         > myRepresentation;
 typedef GroupIndex <
            mySet, 
            myGroup, 
            myRepresentation, 
            myGList 
         > myIndex;

IV.3 Example 3: Non-regular group action without fuzzy searching and with fuzzy intersections:

 typedef nonregular_group_container < myGroup > my_group_container;
 typedef default_set_container < mySet > my_set_container;
 typedef IntervalIntersectionGList < myGroup , 7 , double > myGList;
 typedef DefaultRepresentation <
            mySet, 
            myGroup, 
            myGList, 
            my_group_container, 
            my_set_container 
         > myRepresentation;
 typedef GroupIndex < 
            mySet, 
            myGroup, 
            myRepresentation, 
            myGList,
            my_group_container, 
            my_set_container 
         > myIndex;

IV.4 Beispiel 4: Non-regular group action with fuzzy searching and fuzzy intersections:

 typedef nonregular_group_container < myGroup > my_group_container;
 typedef fuzzy_set_container < mySet > my_set_container;
 typedef IntervalIntersectionGList < myGroup , 7 , double > myGList;
 typedef DefaultRepresentation <
            mySet, 
            myGroup, 
            myGList, 
            my_group_container, 
            my_set_container 
         > myRepresentation;
 typedef GroupIndex < 
            mySet, 
            myGroup, 
            myRepresentation, 
            myGList,
            my_group_container, 
            my_set_container 
         > myIndex;

V. Creating simple command line interfaces using the template class \c IndexCommandLine

The indexing systems obtained from the template classes introduced above merely provide a C++ interface. With only little extra effort, IndexCommandLine produces a simple text based interface that allows users of the indexing system to build indices and start queries using simple commands such as ADD, REMOVE or SEARCH. A more detailled documentation (which currently is available in German only) is given in the documentaiton of groupindex::IndexCommandLine.
The GroupIndex-Template-Library
Universität Bonn, Institut für Informatik III, 2001