PRATT version 2.1

PRATT is a tool to discover patterns conserved in a set of protein sequences.

This tool can also be run from the EBI server with very similar modalities.

STEP 1 - Enter a set of PROTEIN sequences or an alignment
Examples
Supported input:
Your input is a set of sequences an alignment help Option in command line format: -G
If 'alignment' is chosen, then the input is considered as an alignment which will be used to guide the pattern search and only patterns consistent with it will be considered by PRATT.
Loosely, a pattern is considered consistent with the alignment if each symbol in the pattern corresponds to an ungapped column in the alignment and all the characters of the column in question match the pattern symbol. Also, the wildcards in the pattern must be compatible with the number of residues between the corresponding columns in the alignment.
For instance, the pattern
A-x(2,3)-B
is consistent with the following alignment:
    ALVGB
    AG-LB
    ALD-B

STEP 2 - Modify default parameters (optional) Pattern parameters «
The pattern must match at least help Option in command line format: -C%/CM
Set the minimum percentage of sequences OR the minimum number of sequences that must be matched by a pattern for it to be considered by PRATT.
Max pattern length help Option in command line format: -PL
Set the maximum length of a pattern for PRATT to consider it.
For instance,
G-G-[PS]-L-x(1,3)-R
has a length of 8 (1+1+1+1+3+1).
Max number of different pattern symbols help Option in command line format: -PN
Set the maximum number of different symbols in a pattern for PRATT to consider it.
For instance,
G-G-[PS]-L-x(1,3)-R
has 4 different symbols:
G, [PS], L and R
.
Max length of wildcards (x) help Option in command line format: -PX
Set the maximum length of every wildcard (x) in a pattern for PRATT to consider it.
For instance
x
has a length of 1,
x(7)
has a length of 7 and
x(1,3)
has a length of 3.
Max number of flexible wildcards (x) help Option in command line format: -FN
Set the maximum number of flexible wildcards (x) in a pattern for PRATT to consider it.
For instance,
A-x(2)-P-x-G-x(0,2)-D-x(3,5)-S
contains 2 flexible wildcards:
x(0,2)
and
x(3,5)
.
Max flexibility of wildcards (x) help Option in command line format: -FL
Set the maximum flexibility of every wildcard in a pattern for PRATT to consider it.
For instance,
x(3)
has a flexibility of 0,
x(1,2)
has a flexibility of 1 and
x(0,2)
has a flexibility of 2.
Max product of wildcard (x) flexibility help Option in command line format: -FP
Set the maximum product of all wildcard flexibility of a pattern for PRATT to consider it.
The equation for calculating the product is:
(flexibility of wildcard_1 + 1) * ... * (flexibility of wildcard_n + 1)
For instance, for
C-x(2,4)-[DE]-x(10)-F
the product is
(2+1) * (0+1)
= 3
and for
C-x(2,4)-[DE]-x(10,14)-F
the product is
(2+1) * (4+1)
= 15.
Maximum number of pattern symbols used in the initial search help Option in command line format: -BN
PRATT uses a set of pattern symbols to perform the search. This set contains the 20 one-letter amino acid symbols like
G
followed by ambiguous symbols of amino acids sharing some physico-chemical properties like
[DE]
.
With this option, you can choose the maximum number of pattern symbols that will be used during the initial search.
Pattern scoring method help Option in command line format: -S
Choose between two scoring schemes:
  • info: patterns are scored by their information content as defined in Jonassen, I., Collins, J. F. and Higgins, D. G. (1995).
    Note that with this scheme a pattern's score is independent of how many sequences it matches.
  • mdl: this scoring method is derived from a Minimum Description Length (mdl) principle. This method is related to the 'info' scheme but the number of sequences matched is taken into account, i.e. patterns scoring few sequences are penalized in comparison with patterns scoring many.

» Search parameters
Greediness of the search help Option in command line format: -E
Using this parameter, you can adjust the greediness of the search.
Setting the greediness to 0, the search will be exhaustive. Increasing the greediness decreases the time used in the search.
Pattern refinement help Option in command line format: -R
When this option is on, patterns found during the initial search phase undergo a refinement algorithm where more ambiguous pattern symbols can be added.
The refinement phase might lead for a pattern such as
C-x(4)-D
to be refined to
C-x-[ILV]-x-D-x(3)-[DEF]
.
       Generalise ambiguous symbols help Option in command line format: -RG
Only relevant if 'Pattern refinement' is on.
If this option is on, only symbols present in the symbols set can be used during the refinement. This set contains the 20 one-letter amino acid symbols like
I
followed by ambiguous symbols of amino acids sharing some physico-chemical properties like
[ILV]
.
Let's take input sequences that contain
I
or
L
at the same position.
If this option is off,
[IL]
will be reported, while if it is on,
[ILV]
will be reported instead, because
[ILV]
is in the symbols set while
[IL]
is not.

» Output parameters
Pattern format help Option in command line format: -OP
Choose between two formats:
  • PROSITE e.g.
    C-x(2,4)-DE
  • simple consensus:
    x
    matches exactly one arbitrary sequence symbol and
    -
    matches zero or one arbitrary sequence symbol, e.g.
    Cxx--DE
Max number of patterns help Option in command line format: -ON
Set the maximum number of patterns to be reported.
Max number of alignments help Option in command line format: -OA
Set the maximum number of patterns for which an alignment of the sequence regions matching it will be reported.
Print patterns in sequences help Option in command line format: -M
If this option is on, PRATT will print out the location of the sequence segments matching each of the best patterns.
The patterns are given labels A->Z and a->z in order of decreasing pattern score. Each sequence is printed on a line, one character by K-tuple in the sequence. If the pattern with label 'C' matches the 3rd K-tuple in a sequence, C will be printed out. If several patterns match in the same K-tuple, only the best will be printed.
The maximum number of patterns in sequences that can be printed is 52.
       Ratio for printing help Option in command line format: -MR
Only relevant if 'Print patterns in sequences' is on.
Set the K value (ratio) used for printing the summary information about where in each sequence the pattern matches are found.
       Print vertically help Option in command line format: -MV
Only relevant if 'Print patterns in sequences' is on.
If on, the output is printed vertically instead of horizontally, vertical output may be better for large sequence sets.




If you're familiar with PRATT, you can directly enter parameters in a command line format:

e.g. -PL 25

STEP 3 - Submit your job Directly submit best pattern to ScanProsite
View PRATT output file