NOISY

Section: Noisy (l)
Updated: 1.5.12
Index Return to Main Conten ts
 

NAME

noisy - identify homo-plastic characters in multiple sequence alignments  

SYNOPSIS

noisy [--cutoff FLOAT] [--distance STRING] [--help] [--matrix FILE] [--missing STRING] [--nogap] [--noconstant] [--ordering STRING] [--reorder] [--shuffles INT] [--silent] [--smooth INT] [--seqtype CHAR] [--verbose]

 

DESCRIPTION

noisy In a first phase the rows of the input multiple sequence alignment (MSA) in multi fasta format are reordered to conform to a circular ordering. For this purpose noisy includes the corresponding subset of routines from David Bryant and Vincent Moulton's NeighborNet and Stefan Gruenewald's QNet packages. Subsequently, a reliability score for each column of the reordered MSA is calculated. Essentially, the number of character state alterations in an alignment column is counted and compared to the observed count in random shufflings of the column. The uniform pseudo-random number generator Mersenne Twister is used to generate the random shufflings of alignment columns.

noisy exports a PostScript file, visualizing the quality of the columns of the reordered input MSA, the reliability score of all columns of the reordered input MSA as xy-data and a modified alignment in which columns with a reliability smaller then a cutoff value (set via option --cutoff) are removed. The program noisy is written in ISO C++. The source code is available from
http://www.bioinf.uni-leipzig.de/Software/noisy/.  

OPTIONS

--cutoff FLOAT
Set the lower bound of the reliability score for an alignment column to FLOAT. Columns with a score below FLOAT are removed from the output alignment. The name of the output MSA is constructed from the base name of the input MSA by adding the post fix _out.fas
--distance HAMMING|GTR
Set distance calculation of NeighborNet to HAMMING or GTR
-h, --help
Display usage information.
--matrix FILE
Read distance matrix used by NeighborNet to generate the cyclic order from FILE instead of letting NeighborNet calculating the distance matrix by one of the methods given to option --distance.
--missing STRING
Each character of STRING is treated as missing data, and is removed a column before before changes between character states are calculated.
--nogap
Add the gap symbol to the set of missing characters.
--noconstant
Suppress constant columns in the output MSA.
--ordering nnet|qnet|rand[,INT]|all|INT(,INT)*
Set the method to calculate the cyclic order to one of the two major methods NeighborNet which is the default or QNet.

With rand a random sample of all possible orderings of the TAXA can be specified for which the reliability score is calculated. The size of the random sample (default is 1000) can be set by adding an integer after a comma to rand i.e. rand,42. (All orderings with a smaller reliability than cutoff are singled out to a text file with "_best.gr" as post fix)

If all is used than for all possible permutations of the TAXA the reliability score is calculated (Note that for more than 8 TAXA this can become rather time consuming!).

Keep in mind that the qnet algorithm is O(n^4) both in time and memory requirements where n is the number of taxa in the input alignment. This limits the number of taxa to around 120 for all practical purposes. (Note: the current implemented maximum number of taxa is 338 which requires about 30GB of memory!)

Finally a particular cyclic ordering can be specified by a comma-separated list of TAXA indices in the range [0, NumberOfTAXA[ (no spaces are allowed) e.g 3,0,4,1,2 as ordering for the 5 TAXA in the input MSA.

-r, --reorder
Reorder MSA only. No calculation of the reliability score is calculated. The reordered MSA is printed to stdout.
--shuffles INT
Perform INT random shufflings per column of the MSA.
-s, --silent
Suppress the printing of progress information to stderr.
--smooth INT
Calculate a running average over the reliability score of INT columns and use this smoothed values to remove unreliable columns from the MAS.
--seqtype D|P|R
Set sequence type of input MSA to DNA which is the default Protein or RNA. This information is used by NeighborNet during distance matrix calculation.
-v, --verbose
Increase the verbosity level.

 

REFERENCES


Matsumoto, Makoto (1998) Mersenne Twister: {A} 623-dimensionally equidistributed uniform pseudorandom number generator. ACM Trans. on Modeling and Computer Simulation 8(1):3-30

Bryant, David and Moulton, Vincent (2004) Neighbor-Net: An Agglomerative Method for the Construction of Phylogenetic Networks. Mol. Biol. Evol. 21:255-265

Gruenewald Stefan and Forslund Kristoffer and Dress Andreas WM and Moulton Vincent (2007) QNet: an agglomerative method for the construction of phylogenetic networks from weighted quartets. Mol Biol Evol, 24:532-538.

If you use this program in your work you might want to cite:

Dress, Andreas WM and Flamm, Christoph and Fritzsch, Guido and Gruenewald, Stefan and Kruspe, Matthias and Prohaska, Sonja J and Stadler Peter F (2008) Identification of Homoplastic Characters in Multiple Sequence Alignments. Alg Mol Biol, 3:7  

VERSION

This man page documents version 1.5.12 of noisy.  

AUTHORS

Christoph Flamm, Sonja J Prohaska, Guido Fritzsch, Peter F Stadler.  

BUGS

If in doubt our program is right, nature is at fault. Comments and Bug reports should be sent to <sonja@bioinf.uni-leipzig.de>.


 

Index

NAME
SYNOPSIS
DESCRIPTION
OPTIONS
REFERENCES
VERSION
AUTHORS
BUGS

This document was created by man2html, using the manual pages.
Time: 17:51:42 GMT, April 19, 2011