FindMod is a program for de novo discovery of protein post-translational
modifications (PTM). It examines peptide mass fingerprinting results of known
proteins for the presence of 22
types of PTMs of discrete mass: acetylation, amidation, biotin, C-mannosylation,
deamidation, N-acyl diglyceride cysteine (tripalmitate), FAD, farnesylation, formylation, geranyl-geranyl,
gamma-carboxyglutamic acid, O-GlcNAc, hydroxylation, lipoyl, methylation,
myristoylation, palmitoylation, phosphorylation, pyridoxal phosphate, pyrrolidone carboxylic acid,
sulfatation.
This is done by looking at mass differences
between experimentally
determined peptide masses and theoretical peptide masses calculated from
a specified protein sequence.
If a mass difference corresponds to a known PTM not already annotated in
UniProtKB/Swiss-Prot, "intelligent"
rules are applied that examine the sequence of the peptide of interest
and make predictions as to what
amino acid in the peptide is likely to carry the modification.
Input parameters
- Protein Sequence to be characterized
You should specify the sequence of
the protein you would like to characterize and for which you have determined
a set of peptide masses. If this protein is known in UniProtKB/Swiss-Prot or UniProtKB/TrEMBL, enter the
UniProtKB/Swiss-Prot ID code (e.g. TKN1_HUMAN) or the protein accession number (e.g. P20366).
If the protein is not known in the UniProtKB/Swiss-Prot and UniProtKB/TrEMBL databases, you can enter the sequence of your
protein of interest, in single letter amino acid code, in either upper or lower case.
In the case of a manually entered sequence, the user is required to specify the biological
source of the query protein. This
information is used to determine whether certain PTMs are likely to occur
in the sequence.
Protein sequences from other sources (e.g. word processor programs or
other Web pages) can be copied and pasted directly into this field.
If there are spaces in your sequence,
these will be ignored.
Note that the characters O and U are not considered and will give an error message. However, the residue J will be
treated as either Ile or Leu, which have the same average and monoisotopic masses.
The characters B, X, or Z (see Comment 5 of the
Compute pI/Mw documentation) are accepted, but no masses are computed for
peptides containing one or more of
these characters.
- Peptide Masses
Enter the experimentally measured peptide masses generated from the unknown protein
in the «
Enter a list of peptide masses...»text field, and separate them by spaces, tabs or new lines.
Note!
You can copy a list of peptides from Excel or other applications and paste them
directly into the text field.
Note!
Avoid using peptide masses known to be from autodigestion of an enzyme (e.g.
trypsin!), or other artefactual peaks (e.g. matrix peaks).
Upload a .pkm, .dta or text file
If the peptide mass fingerprinting data is stored in a file of one of the formats
listed below, you can also upload the file directly from your computer:
(1) Click on the on the «Browse...»button
(2) Select the file containing the relevant peptide mass data and
(3) Click on the «Open» button
The peptide masses will then be extracted automatically from this file.
Supported formats:
(1) .pkm format, produced by the Voyager software of Perseptive Biosystems or the GRAMS software:
OP=0
Center X Peak Y Left X Right X Time X Mass Difference Name
STD.Misc Height Left Y Right Y %Height,Width,%Area,%Quan,H/A
833.319 2189 833.260 833.378 0.016 0 0
C0.? 0 762 762
854.843 5078 854.769 854.917 0.001 0 0
C0.? 0 3453 3453
863.419 5108 863.064 863.775 0.001 0 0
C0.? 0 3567 3567
872.402 12519 872.347 872.456 0.002 0 0
C0.? 0 11417 11417
874.395 6730 874.331 874.460 0.002 0 0
C0.? 0 3559 3559
887.786 5903 887.540 888.031 0.003 0 0
C0.? 0 4131 4131
898.475 3329 898.416 898.534 0.006 0 0
C0.? 0 1377 1377
904.366 7432 904.199 904.533 0.001 0 0
C0.? 0 5596 5596
955.300 2598 955.229 955.371 0.011 0 0
C0.? 0 1089 1089
973.845 16689 973.749 973.941 0.001 0 0
All lines before the line containing ‘H/A’ are ignored.
After that, only lines which do not contain any capital letters in the first 20 characters
are retained. From the retained lines, the first column is interpreted as the mass.
(2) Sequest format:
1.001
833.319 2189
844.333 0.0
854.843 5078
863.419 5108
872.402 12519
874.395 6730
887.786 5903
898.475 3329
899.555 0.0
904.366 7432
955.300 2598
973.845 16689
The first line is considered as a comment and is ignored.
All subsequent lines are interpreted to contain a mass and an intensity (if any), and mass values are taken into
account if the corresponding intensity is > 0.
(3) Any user-created files can be uploaded if they correspond to the following rules:
The first line does not contain any mass value (if it does, this mass value is
ignored).
Lines containing masses must start with the mass, and the first 20 characters must
not contain any uppercase letters.
>my file
833.319
854.843
863.419
872.402
874.395
887.786
898.476
904.366
955.300
973.845
Note!
The upload option only works if you see a 'browse' button next to the text
entry field. This should be the case for most recent web browser versions, e.g.
Netscape 3.0 or higher, MS Internet Explorer 4.0 or higher.
Users should avoid using peptide masses known to
be from autodigestion of an enzyme (e.g. trypsin!), or other artifactual peaks (e.g.
matrix peaks).
- Telling FindMod what to do
FindMod can predict potential protein post-translational modifications and find potential
single amino acid substitutions in peptides. The user can specify whether the program
should detect only potential PTMs, only single amino acid substitutions or both.
The experimental peptide masses will first be compared to theoretical unmodified
peptides and to peptides modified as documented in UniProtKB/Swiss-Prot or by chemical modifications. The user can
choose whether all peptide masses or only those that have not been attributed a theoretical
peptide in this process should be examined for potential PTMs resp. single amino acid substitutions.
- User-defined post-translational modifications
If you wish to take into account other post-translational modifications than those already predictable by FindMod, you can enter
them here. For each of these PTMs, specify the name, its atomar composition and the amino
acids this modification can be observed on.
- Ion mode (Masses as [M] or [M+H]+)
You can enter the masses of your peptides as [M] or as [M+H]+,
however you must select the appropriate button. If you select the [M+H]+ button,
all peptide masses calculated from the database will have one proton
(mass of 1 unit) added before matching with user-specified peptides.
- Isotopic resolution (average or monoisotopic masses)
- Chemical treatment of cysteine
You can choose how you would like cysteines in a protein to be modified,
before the theoretical masses of peptides are
calculated. Experimentally, proteins are usually subjected
to reduction and then alkylation with different reagents before they are used
to generate peptides. If you would like the masses of unmodified
cysteines in your peptides, select "nothing (in reduced
form)". If you would like cysteines to be theoretically reduced
and alkylated, specify
the reagent to be used for alkylation.
You have a choice of iodoacetamide (--> carboxyamidomethyl cysteine, Cys_CAM),iodoacetic acid (--> carboxymethyl cysteine, Cys_CM) and 4-vinyl pyridene (--> pyridyl-ethyl cysteine, Cys_PE).
In that case, FindMod will consider both peptides with unmodified cysteines and peptides with modified cysteines.
Acrylamide adducts:
In proteins prepared by
polyacrylamide gel electrophoresis, it can be common for cysteines
to have reacted with free acrylamide monomers to form propionamide cysteine (Cys_PAM).
The program will then modify the theoretical masses of Cys-containing
peptides accordingly.
- Oxidation state of methionine
You can request for all methionines in theoretical peptides to be oxidised.
If this option is selected, the program will modify the
theoretical masses of Met-containing peptides accordingly and consider
both peptides with unmodified methionines and peptides with modified
methionines. Note that proteins prepared by gel electrophoresis often show this
modification.
- Mass tolerance
Peptide masses can be specified in ppm (parts per million) or in Dalton.
- Digestion agent (enzyme)
Specify the enzyme or chemical reagent
that you used to generate your peptides
(see the corresponding section in the PeptideMass instructions
for the available enzymes
and their cleavage rules).
- Missed cleavages
In order to take into account partial cleavages, you can specify
a maximum number (0, 1, 2 or 3) of missed cleavage sites to be allowed.
If the maximum number of missed cleavages entered is 1, all concatenations
of two adjoining peptides are also added to the list of theoretical peptides under
consideration.
- Sorting of peptides in the result tables
Here you can choose if you would like the peptides to be sorted by their mass (from smallest
to largest) or by their chronological order in the protein.
- Send the result by e-mail
FindMod results are displayed on-line in your browser window or can be sent by e-mail.
If the results should be sent back to you by e-mail, tick the � Send the result by e-mail � box. In the � Your e-mail:� text field you should enter the
correct e-mail address (e.g. name@unknown.ch) to where the results should be sent. The email option is recommended, in particular for queries with a
high number of peptide masses. This avoids timeouts (�document contains no data�) which can
occur for the on-line option: the browser interrupts the connection with the program if the search is not terminated after a certain time (usually about 3
minutes).
Note that email results are sent in form of a html file, in exactly the same format as on-line, and that there is no loss
of functionality compared to on-line display.
- Reset and Perform Buttons
Once you have filled in the form according to your needs, press the
button "Start FindMod". If you have made
a mistake and would like all fields to be reset to their default values, press the Reset button.
FindMod Output
The results from FindMod are divided into a
header and up to three
tables.
The header contains information about the submitted protein: a link to the UniProtKB/Swiss-Prot or UniProtKB/TrEMBL
entry and the description line (if the protein is in UniProtKB), pI and molecular weight.
Then the input parameters are listed, followed by an active link to
PeptideMass. This allows the user to perform a theoretical cleavage of the protein of interest.
The tables report the peptides
whose experimental masses match unmodified or modified theoretical digest products
of the protein of interest:
- The first table reports matches to theoretical digest products as unmodified, modified with the annotations in UniProtKB/Swiss-Prot and chemically modified as specified in the input form.
- The second table reports those user masses which differ from a theoretical database mass
by a mass value corresponding to one of the considered PTMs.
These peptides are further examined, and FindMod checks whether the peptide sequences
contain amino acids which are likely to carry the modification in question.
This is done by applying a set of prediction rules which have been defined using
information in the PROSITE database,
examining all the annotations in UniProtKB/Swiss-Prot and examining information in the
literature.
The program first lists the matches conforming to these rules, highlighting potentially
modified residues in colour.
Potential PTMs detected by mass difference, but not confirmed by the rules are included in a second list.
- The third table shows potential single AA substitutions detected by mass difference.
Notes:
- A BLOSUM62 score is given for each suggested single AA substitution.
This provides information about the probability of substitution: Lowest score: -4 (low probability of substitution), highest score: 11 (high probability of substitution).
- Potential single amino acid
substitutions occurring on the cleavage site and substituting the AA for an AA after which the enzyme used for the digest
does not cleave are not displayed.
- If the suggested AA substitution corresponds to a sequence variant or conflict as annotated in
the UniProtKB/Swiss-Prot feature table, this substitution is highlighted in color (green background for that table
line), and a hypertext link is provided to the corresponding annotated variant or conflict.
At the end of the output page the user will find a list of those entered
matches which did not match in any of the above tables (if any).
Comments
- Signal Sequences, Propeptides & Transit Peptides
Signal sequences, propeptides and transit sequences are all
removed from proteins before cleavage rules
are applied and peptide mass computed for the mature protein.
- Chains and Polypeptides that Produce Multiple Mature Proteins
If there are known chains that are created from any
database entry, these are considered as different
polypeptides (e.g. FETUA_HUMAN).
The same applies to any proteins which are known to form multiple mature peptides
or proteins from a single initial polypeptide (e.g. COLI_HUMAN). In both cases
FindMod comes up with an intermediary page asking the user to select
the chain or peptide to which the entered experimental masses correspond.