NAME
SYNOPSIS
OPTIONS AND ARGUMENTS
DESCRIPTION
EXAMPLES
SEE ALSO
DISCLAIMER
BUGS
UPDATES
AUTHOR
VERSION

NAME

randr - Creating user-specific sequences of either RNA, DNA or protein.

SYNOPSIS

randr [options]

Options:

    -type [d|r|p] : type of sequence

    -n [int] :      size of the set

    -l [int] :      length of the sequences

    -min [int] :    minimum sequence length

    -max [int] :    maximum sequence length

    -a [float] :    percentage of adenine nucleotides

    -c [float] :    percentage of cytosine nucleotides

    -g [float] :    percentage of guanine nucleotides

    -t [float] :    percentage of thymine nucleotides

    -u [float] :    percentage of uracil nucleotides

    -v :            informations about the sequence set

    -help|h|? :     print (this) brief help message

    -man :          full documentation

OPTIONS AND ARGUMENTS

-type [d|r|p]: The type of sequence to create (DNA, RNA, or protein). The protein sequences will be composed using the single letter code for amino acids. Default: RNA
-n [integer]: The size of the set to create. Default: 10
-l [integer]: The length of the sequences to create. Default: 100
-min [integer]: The minimum length of the sequences to create. If set, sequences are created with random lengths between MIN and the default sequence length (100) or MAX, whatever is lower. If MIN > MAX or MIN > default length, than all sequences in the set have the length of MIN.
-max [integer]: The maximum length of the sequences to create. If set, sequences are created with random lengths between the default sequence length (100) or MIN, whatever is higher, and MAX. If MAX < MIN, than the values are swapped. If MAX > default length, than all sequences in the set have the length of MAX.
-a [float]: The percentage of adenine nucleotides in the sequence to create. Default: 0.25
-c [float]: The percentage of cytosine nucleotides in the sequence to create. Default: 0.25
-g [float]: The percentage of guanine nucleotides in the sequence to create. Default: 0.25
-t|u [float]: The percentage of thymine|uracil nucleotides in the sequence to create. Default: 0.25
-v: Detailed informations about the created sequence set.
-help|h|?: Prints a brief help message and exits.
-man: Prints the manual page and exits.

DESCRIPTION

randr will create a set of random sequences of either RNA, DNA, or protein. The user can specify the number of sequences in the set, the length of the sequences (or a minimum and maximum length), and, in case of DNA or RNA, the percentage of the four different nucleotides within the sequence. Remember that the sum of all percentages needs to be exact 1.

EXAMPLES

randr -type d -l 200: Ten DNA sequences with the length 200.
randr -n 100 -min 150 -max 500: A set of 100 RNA sequences. Each sequence has a random length between 150 and 500.
randr -type p -max 200: Creates a set of 10 protein sequences. Each sequence has a random length between (the default value) 100 and 200.

DISCLAIMER

The author is not responsible for any loss of data, wrong research results or anything else which may be caused by using this tool.

BUGS

This script does not, as far as known, obey Sturgeons law, which says:

90% of everything is crud.

But a known issue is the accuracy of the requested nucleotide percentages. This is caused by the function which randomizes the nucleotide sequences. In general, the accuracy is improved when large sequence sets (>500 sequences) with long sequences (>200 nt) are created. And the more accurate you specify the percentages the more accurate will they be. This problem remains to be solved.

Another issue is the large amount of time needed to create very large sequence sets (>10.000 sequences) with long sequences (>5.000 nt). On my IBM Thinkpad with a 1.5GHz Intel Celeron (R) M and 1GB DDR2 RAM it takes some 2 minutes and 50 seconds of user time to create such a set. A future aim is to improve this via re-writing parts of the code.

UPDATES

Please visit http://www.bioinf.uni-leipzig.de/~alex for possible updates of this perl script.

AUTHOR

Alex Donath.

In case you've found any bugs, please report them to <alex [the symbol] bioinf [dot] uni-leipzig [dot] de>.

VERSION

v0.5b (Bled/Slovenia, February 2010)