NOISY
Section: Noisy (l)
Updated: 1.5.12
Index
Return to Main Conten
ts
NAME
noisy - identify homo-plastic characters in multiple sequence alignments
SYNOPSIS
noisy [--cutoff FLOAT] [--distance STRING] [--help]
[--matrix FILE] [--missing STRING] [--nogap]
[--noconstant] [--ordering STRING] [--reorder] [--shuffles INT] [--silent] [--smooth INT] [--seqtype CHAR] [--verbose]
DESCRIPTION
noisy
In a first phase the rows of the input multiple sequence alignment (MSA) in
multi fasta format are reordered to conform to a circular ordering. For
this purpose noisy includes the corresponding subset of routines from
David Bryant and Vincent Moulton's NeighborNet and Stefan
Gruenewald's QNet packages. Subsequently, a reliability score for
each column of the reordered MSA is calculated. Essentially, the number of
character state alterations in an alignment column is counted and compared
to the observed count in random shufflings of the column. The uniform
pseudo-random number generator Mersenne Twister is used to generate
the random shufflings of alignment columns.
noisy exports a PostScript file, visualizing the quality of the
columns of the reordered input MSA, the reliability score of all columns of
the reordered input MSA as xy-data and a modified alignment in which
columns with a reliability smaller then a cutoff value (set via option
--cutoff) are removed. The program noisy is written in ISO C++. The
source code is available from
http://www.bioinf.uni-leipzig.de/Software/noisy/.
OPTIONS
- --cutoff FLOAT
-
Set the lower bound of the reliability score for an alignment column to
FLOAT. Columns with a score below FLOAT are removed from the
output alignment. The name of the output MSA is constructed from the
base name of the input MSA by adding the post fix _out.fas
- --distance HAMMING|GTR
-
Set distance calculation of NeighborNet to HAMMING or GTR
- -h, --help
-
Display usage information.
- --matrix FILE
-
Read distance matrix used by NeighborNet to generate the cyclic order
from FILE instead of letting NeighborNet calculating the
distance matrix by one of the methods given to option --distance.
- --missing STRING
-
Each character of STRING is treated as missing data, and is removed a
column before before changes between character states are calculated.
- --nogap
-
Add the gap symbol to the set of missing characters.
- --noconstant
-
Suppress constant columns in the output MSA.
- --ordering nnet|qnet|rand[,INT]|all|INT(,INT)*
-
Set the method to calculate the cyclic order to one of the two major
methods NeighborNet which is the default or QNet.
With rand a random sample of all possible orderings of the TAXA can
be specified for which the reliability score is calculated. The size of the
random sample (default is 1000) can be set by adding an integer after a
comma to rand i.e. rand,42. (All orderings with a smaller
reliability than cutoff are singled out to a text file with "_best.gr" as
post fix)
If all is used than for all possible permutations of the TAXA the
reliability score is calculated (Note that for more than 8 TAXA this can
become rather time consuming!).
Keep in mind that the qnet algorithm is O(n^4) both in time and
memory requirements where n is the number of taxa in the input
alignment. This limits the number of taxa to around 120 for all practical
purposes. (Note: the current implemented maximum number of taxa is 338
which requires about 30GB of memory!)
Finally a particular cyclic
ordering can be specified by a comma-separated list of TAXA indices in the
range [0, NumberOfTAXA[ (no spaces are allowed) e.g 3,0,4,1,2 as ordering
for the 5 TAXA in the input MSA.
- -r, --reorder
-
Reorder MSA only. No calculation of the reliability score is
calculated. The reordered MSA is printed to stdout.
- --shuffles INT
-
Perform INT random shufflings per column of the MSA.
- -s, --silent
-
Suppress the printing of progress information to stderr.
- --smooth INT
-
Calculate a running average over the reliability score of INT columns
and use this smoothed values to remove unreliable columns from the MAS.
- --seqtype D|P|R
-
Set sequence type of input MSA to DNA which is the default Protein or RNA.
This information is used by NeighborNet during distance matrix
calculation.
- -v, --verbose
-
Increase the verbosity level.
REFERENCES
Matsumoto, Makoto (1998) Mersenne Twister: {A} 623-dimensionally
equidistributed uniform pseudorandom number generator. ACM Trans. on
Modeling and Computer Simulation 8(1):3-30
Bryant, David and Moulton, Vincent (2004) Neighbor-Net: An Agglomerative
Method for the Construction of Phylogenetic
Networks. Mol. Biol. Evol. 21:255-265
Gruenewald Stefan and Forslund Kristoffer and Dress Andreas WM and Moulton
Vincent (2007) QNet: an agglomerative method for the construction of
phylogenetic networks from weighted quartets. Mol Biol Evol, 24:532-538.
If you use this program in your work you might want to cite:
Dress, Andreas WM and Flamm, Christoph and Fritzsch, Guido and
Gruenewald, Stefan and Kruspe, Matthias and Prohaska, Sonja J and Stadler
Peter F (2008) Identification of Homoplastic Characters in Multiple
Sequence Alignments. Alg Mol Biol, 3:7
VERSION
This man page documents version 1.5.12 of noisy.
AUTHORS
Christoph Flamm, Sonja J Prohaska, Guido Fritzsch, Peter F Stadler.
BUGS
If in doubt our program is right, nature is at fault.
Comments and Bug reports should be sent to <sonja@bioinf.uni-leipzig.de>.
Index
- NAME
-
- SYNOPSIS
-
- DESCRIPTION
-
- OPTIONS
-
- REFERENCES
-
- VERSION
-
- AUTHORS
-
- BUGS
-
This document was created by man2html, using the manual pages.
Time: 17:51:42 GMT, April 19, 2011