FindMod tool - Documentation
FindMod is a program for de novo discovery of protein post-translational modifications (PTM). It examines peptide mass
fingerprinting results of known proteins for the presence of 22 types of PTMs of discrete mass:
acetylation, amidation, biotin, C-mannosylation, deamidation, N-acyl diglyceride cysteine (tripalmitate), FAD, farnesylation, formylation,
geranyl-geranyl, gamma-carboxyglutamic acid, O-GlcNAc, hydroxylation, lipoyl, methylation, myristoylation, palmitoylation, phosphorylation,
pyridoxal phosphate, pyrrolidone carboxylic acid, sulfatation.
This is done by looking at mass differences between experimentally determined peptide masses and theoretical peptide masses calculated from a specified protein sequence.
If a mass difference corresponds to a known PTM not already annotated in UniProtKB/Swiss-Prot, "intelligent" rules are applied that examine the sequence of the peptide of interest and make predictions as to what amino acid in the peptide is likely to carry the modification.
The results from FindMod are divided into a header and up to three tables .
The header contains information about the submitted protein: a link to the UniProtKB/Swiss-Prot or UniProtKB/TrEMBL entry and the description line (if the protein is in UniProtKB), pI and molecular weight.
Then the input parameters are listed, followed by an active link to PeptideMass. This allows the user to perform a theoretical cleavage of the protein of interest.
The tables report the peptides whose experimental masses match unmodified or modified theoretical digest products of the protein of interest:
This is done by looking at mass differences between experimentally determined peptide masses and theoretical peptide masses calculated from a specified protein sequence.
If a mass difference corresponds to a known PTM not already annotated in UniProtKB/Swiss-Prot, "intelligent" rules are applied that examine the sequence of the peptide of interest and make predictions as to what amino acid in the peptide is likely to carry the modification.
Input parameters - FindMod Output - Comments
Input parameters
-
Protein Sequence to be characterized
You should specify the sequence of the protein you would like to characterize and for which you have determined a set of peptide masses. If this protein is known in UniProtKB/Swiss-Prot or UniProtKB/TrEMBL, enter the UniProtKB/Swiss-Prot ID code (e.g. TKN1_HUMAN) or the protein accession number (e.g. P20366). If the protein is not known in the UniProtKB/Swiss-Prot and UniProtKB/TrEMBL databases, you can enter the sequence of your protein of interest, in single letter amino acid code, in either upper or lower case. In the case of a manually entered sequence, the user is required to specify the biological source of the query protein. This information is used to determine whether certain PTMs are likely to occur in the sequence.
Protein sequences from other sources (e.g. word processor programs or other Web pages) can be copied and pasted directly into this field. If there are spaces in your sequence, these will be ignored.
Note that the characters O and U are not considered and will give an error message. However, the residue J will be treated as either Ile or Leu, which have the same average and monoisotopic masses. The characters B, X, or Z (see Comment 5 of the Compute pI/Mw documentation) are accepted, but no masses are computed for peptides containing one or more of these characters.
-
Peptide Masses
Enter the experimentally measured peptide masses generated from the unknown protein in the « Enter a list of peptide masses... »text field, and separate them by spaces, tabs or new lines.
You can copy a list of peptides from Excel or other applications and paste them directly into the text field.
Avoid using peptide masses known to be from autodigestion of an enzyme (e.g. trypsin!), or other artefactual peaks (e.g. matrix peaks).
Upload a .pkm, .dta or text file
If the peptide mass fingerprinting data is stored in a file of one of the formats listed below, you can also upload the file directly from your computer:
(1) Click on the on the « Browse... »button
(2) Select the file containing the relevant peptide mass data and
(3) Click on the « Open » button
The peptide masses will then be extracted automatically from this file.
Supported formats:
(1) Any user-created files can be uploaded if they correspond to the following rules: The first line does not contain any mass value (if it does, this mass value is ignored).
Lines containing masses must start with the mass, and the first 20 characters must not contain any uppercase letters.>my file 833.319 854.843 863.419 872.402 874.395 887.786 898.476 904.366 955.300 973.845
(2) Sequest format:1.001 833.319 2189 844.333 0.0 854.843 5078 863.419 5108 872.402 12519 874.395 6730 887.786 5903 898.475 3329 899.555 0.0 904.366 7432 955.300 2598 973.845 16689
The first line is considered as a comment and is ignored.
All subsequent lines are interpreted to contain a mass and an intensity (if any), and mass values are taken into account if the corresponding intensity is > 0.
(3) .pkm format, produced by the Voyager software of Perseptive Biosystems or the GRAMS software:OP=0 Center X Peak Y Left X Right X Time X Mass Difference Name STD.Misc Height Left Y Right Y %Height,Width,%Area,%Quan,H/A 833.319 2189 833.260 833.378 0.016 0 0 C0.? 0 762 762 854.843 5078 854.769 854.917 0.001 0 0 C0.? 0 3453 3453 863.419 5108 863.064 863.775 0.001 0 0 C0.? 0 3567 3567 872.402 12519 872.347 872.456 0.002 0 0 C0.? 0 11417 11417 874.395 6730 874.331 874.460 0.002 0 0 C0.? 0 3559 3559 887.786 5903 887.540 888.031 0.003 0 0 C0.? 0 4131 4131 898.475 3329 898.416 898.534 0.006 0 0 C0.? 0 1377 1377 904.366 7432 904.199 904.533 0.001 0 0 C0.? 0 5596 5596 955.300 2598 955.229 955.371 0.011 0 0 C0.? 0 1089 1089 973.845 16689 973.749 973.941 0.001 0 0
All lines before the line containing ‘H/A’ are ignored.
After that, only lines which do not contain any capital letters in the first 20 characters are retained. From the retained lines, the first column is interpreted as the mass.
Users should avoid using peptide masses known to be from autodigestion of an enzyme (e.g. trypsin!), or other artifactual peaks (e.g. matrix peaks).
-
Telling FindMod what to do
FindMod can predict potential protein post-translational modifications and find potential single amino acid substitutions in peptides. The user can specify whether the program should detect only potential PTMs, only single amino acid substitutions or both.
The experimental peptide masses will first be compared to theoretical unmodified peptides and to peptides modified as documented in UniProtKB/Swiss-Prot or by chemical modifications. The user can choose whether all peptide masses or only those that have not been attributed a theoretical peptide in this process should be examined for potential PTMs resp. single amino acid substitutions.
-
User-defined post-translational modifications
If you wish to take into account other post-translational modifications than those already predictable by FindMod, you can enter them here. For each of these PTMs, specify the name, its atomar composition and the amino acids this modification can be observed on.
-
Ion mode (Masses as [M] or [M+H]+)
You can enter the masses of your peptides as [M] or as [M+H] +, however you must select the appropriate button. If you select the [M+H] + button, all peptide masses calculated from the database will have one proton (mass of 1 unit) added before matching with user-specified peptides.
-
Chemical treatment of cysteine
You can choose how you would like cysteines in a protein to be modified, before the theoretical masses of peptides are calculated. Experimentally, proteins are usually subjected to reduction and then alkylation with different reagents before they are used to generate peptides. If you would like the masses of unmodified cysteines in your peptides, select "nothing (in reduced form)". If you would like cysteines to be theoretically reduced and alkylated, specify the reagent to be used for alkylation.
You have a choice of iodoacetamide (--> carboxyamidomethyl cysteine, Cys_CAM),iodoacetic acid (--> carboxymethyl cysteine, Cys_CM) and 4-vinyl pyridene (--> pyridyl-ethyl cysteine, Cys_PE).
In that case, FindMod will consider both peptides with unmodified cysteines and peptides with modified cysteines.
Acrylamide adducts:
In proteins prepared by polyacrylamide gel electrophoresis, it can be common for cysteines to have reacted with free acrylamide monomers to form propionamide cysteine (Cys_PAM).
The program will then modify the theoretical masses of Cys-containing peptides accordingly.
-
Oxidation state of methionine
You can request for all methionines in theoretical peptides to be oxidised. If this option is selected, the program will modify the theoretical masses of Met-containing peptides accordingly and consider both peptides with unmodified methionines and peptides with modified methionines. Note that proteins prepared by gel electrophoresis often show this modification.
-
Mass tolerance
Peptide masses can be specified in ppm (parts per million) or in Dalton.
-
Digestion agent (enzyme)
Specify the enzyme or chemical reagent that you used to generate your peptides (see the corresponding section in the PeptideMass instructions for the available enzymes and their cleavage rules).
-
Missed cleavages
In order to take into account partial cleavages, you can specify a maximum number (0, 1, 2 or 3) of missed cleavage sites to be allowed. If the maximum number of missed cleavages entered is 1, all concatenations of two adjoining peptides are also added to the list of theoretical peptides under consideration.
-
Sorting of peptides in the result tables
Here you can choose if you would like the peptides to be sorted by their mass (from smallest to largest) or by their chronological order in the protein.
-
Send the result by e-mail
FindMod results are displayed on-line in your browser window or can be sent by e-mail. If the results should be sent back to you by e-mail, tick the "Send the result by e-mail" box. In the "Your e-mail:" text field you should enter the correct e-mail address (e.g. name@unknown.ch) to where the results should be sent. The email option is recommended, in particular for queries with a high number of peptide masses. This avoids timeouts ("document contains no data") which can occur for the on-line option: the browser interrupts the connection with the program if the search is not terminated after a certain time (usually about 3 minutes).
Note that email results are sent in form of a html file, in exactly the same format as on-line, and that there is no loss of functionality compared to on-line display.
-
Reset and Perform Buttons
Once you have filled in the form according to your needs, press the button "Start FindMod". If you have made a mistake and would like all fields to be reset to their default values, press the Reset button.
FindMod Output
The results from FindMod are divided into a header and up to three tables .
The header contains information about the submitted protein: a link to the UniProtKB/Swiss-Prot or UniProtKB/TrEMBL entry and the description line (if the protein is in UniProtKB), pI and molecular weight.
Then the input parameters are listed, followed by an active link to PeptideMass. This allows the user to perform a theoretical cleavage of the protein of interest.
The tables report the peptides whose experimental masses match unmodified or modified theoretical digest products of the protein of interest:
-
The
first
table reports matches to theoretical digest products as unmodified, modified with the annotations in UniProtKB/Swiss-Prot and
chemically modified as specified in the input form.
-
The
second
table reports those user masses which differ from a theoretical database mass by a mass value corresponding to one of the considered PTMs.
These peptides are further examined, and FindMod checks whether the peptide sequences contain amino acids which are likely to carry the modification in question.
This is done by applying a set of prediction rules which have been defined using information in the PROSITE database, examining all the annotations in UniProtKB/Swiss-Prot and examining information in the literature.
The program first lists the matches conforming to these rules, highlighting potentially modified residues in colour.
Potential PTMs detected by mass difference, but not confirmed by the rules are included in a second list.
-
The
third
table shows potential single AA substitutions detected by mass difference.
- A BLOSUM62 score is given for each suggested single AA substitution. This provides information about the probability of substitution: Lowest score: -4 (low probability of substitution), highest score: 11 (high probability of substitution).
- Potential single amino acid substitutions occurring on the cleavage site and substituting the AA for an AA after which the enzyme used for the digest does not cleave are not displayed.
- If the suggested AA substitution corresponds to a sequence variant or conflict as annotated in the UniProtKB/Swiss-Prot feature table, this substitution is highlighted in color (green background for that table line), and a hypertext link is provided to the corresponding annotated variant or conflict.
Comments
-
Signal Sequences, Propeptides & Transit Peptides
Signal sequences, propeptides and transit sequences are all removed from proteins before cleavage rules are applied and peptide mass computed for the mature protein.
-
Chains and Polypeptides that Produce Multiple Mature Proteins
If there are known chains that are created from any database entry, these are considered as different polypeptides (e.g. FETUA_HUMAN).
The same applies to any proteins which are known to form multiple mature peptides or proteins from a single initial polypeptide (e.g. COLI_HUMAN). In both cases FindMod comes up with an intermediary page asking the user to select the chain or peptide to which the entered experimental masses correspond.