The pattern must match at least
Option in command line format: -C%/CM
Set the minimum percentage of sequences OR the minimum number of sequences that
must be matched by a pattern for it to be considered by PRATT.
% of the sequences
sequences
Max pattern length
Option in command line format: -PL
Set the maximum length of a pattern for PRATT to consider it.
For instance, G-G-[PS]-L-x(1,3)-R has a length of 8 (1+1+1+1+3+1).
Max number of different pattern symbols
Option in command line format: -PN
Set the maximum number of different symbols in a pattern for PRATT to consider it.
For instance, G-G-[PS]-L-x(1,3)-R has 4 different symbols: G, [PS], L and R .
Max length of wildcards (x)
Option in command line format: -PX
Set the maximum length of every wildcard (x) in a pattern for PRATT to consider it.
For instance x has a length of 1, x(7) has a length of 7
and x(1,3) has a length of 3.
Max number of flexible wildcards (x)
Option in command line format: -FN
Set the maximum number of flexible wildcards (x) in a pattern for PRATT to consider it.
For instance, A-x(2)-P-x-G-x(0,2)-D-x(3,5)-S contains 2 flexible wildcards:
x(0,2) and x(3,5) .
Max flexibility of wildcards (x)
Option in command line format: -FL
Set the maximum flexibility of every wildcard in a pattern for PRATT to consider it.
For instance, x(3) has a flexibility of 0,
x(1,2) has a flexibility of 1 and x(0,2) has a flexibility of 2.
Max product of wildcard (x) flexibility
Option in command line format: -FP
Set the maximum product of all wildcard flexibility of a pattern for PRATT to consider it.
The equation for calculating the product is:
(flexibility of wildcard_1 + 1) * ... * (flexibility of wildcard_n + 1)
For instance, for C-x(2,4)-[DE]-x(10)-F the product is
(2+1) * (0+1) = 3
and for C-x(2,4)-[DE]-x(10,14)-F the product is
(2+1) * (4+1) = 15.
Maximum number of pattern symbols used in the initial search
Option in command line format: -BN
PRATT uses a set of pattern symbols to perform the search. This set contains the 20
one-letter amino acid symbols like G
followed by ambiguous symbols of amino acids sharing some physico-chemical properties like [DE] .
With this option, you can choose the maximum number of pattern symbols that will be used during the
initial search.
Pattern scoring method
Option in command line format: -S
Choose between two scoring schemes:
info : patterns are scored by their information content as defined in
Jonassen, I., Collins, J. F. and Higgins, D. G. (1995) .
Note that with this scheme a pattern's score is independent of how many sequences it matches.
mdl : this scoring method is derived from a Minimum Description Length (mdl) principle.
This method is related to the 'info' scheme but the number of sequences matched is taken into account,
i.e. patterns scoring few sequences are penalized in comparison with patterns scoring many.
info
mdl