DinUSaur and the re-annotated GenBank files


 

DinUSaur is currently under preparation and will be a web--based tool providing a pipeline for analysis of unassigned sequences in mitochondrial genomes and their occurrence in other species.
The underlying database is constructed by editing of NCBI Genbank files. Interesting annotations like protein coding genes, tRNA coding genes, and rRNA coding genes are stated as 'assigned' and stored in a database whereas all other features are filtered out. The regions in between the remaining features therefore are called unassigned sequences (UAS). Each UAS can be interpreted as possible location of genes currently missing in the annotation of the corresponding Genbank file.

The results produced by DinUSaur are calculated by the following steps:

  • The annotations of the selected species are loaded from the local database.
  • All unassigned sequences restricted by specific constraints e.g. limited length, are calculated or loaded from database if cached.
  • These UAS are aligned against a taxonomical group of the species using NCBI blast. However the species of the echinoderms are blasted correctly against the sequences of their group.
  • Each blast hit defines a location on the sequence of another species and therefore is connected to a list of features annotated at that location. Counting the number of species having one and the same feature as matched annotation yields a support value for homology between uas and feature.
  • For all uas the resulting web page lists the matched features with their corresponding support values and highlights them, if they are missed in the Genbank annotation file of the inspected species.

re-annotated GenBank files