FindMod tool

FindMod is a program for de novo discovery of protein post-translational modifications (PTM). It examines peptide mass fingerprinting results of known proteins for the presence of 22 types of PTMs of discrete mass: acetylation, amidation, biotin, C-mannosylation, deamidation, N-acyl diglyceride cysteine (tripalmitate), FAD, farnesylation, formylation, geranyl-geranyl, gamma-carboxyglutamic acid, O-GlcNAc, hydroxylation, lipoyl, methylation, myristoylation, palmitoylation, phosphorylation, pyridoxal phosphate, pyrrolidone carboxylic acid, sulfatation.

This is done by looking at mass differences between experimentally determined peptide masses and theoretical peptide masses calculated from a specified protein sequence.

If a mass difference corresponds to a known PTM not already annotated in UniProtKB/Swiss-Prot, "intelligent" rules are applied that examine the sequence of the peptide of interest and make predictions as to what amino acid in the peptide is likely to carry the modification.


Input parameters

  1. Protein Sequence to be characterized
  2. You should specify the sequence of the protein you would like to characterize and for which you have determined a set of peptide masses. If this protein is known in UniProtKB/Swiss-Prot or UniProtKB/TrEMBL, enter the UniProtKB/Swiss-Prot ID code (e.g. TKN1_HUMAN) or the protein accession number (e.g. P20366). If the protein is not known in the UniProtKB/Swiss-Prot and UniProtKB/TrEMBL databases, you can enter the sequence of your protein of interest, in single letter amino acid code, in either upper or lower case. In the case of a manually entered sequence, the user is required to specify the biological source of the query protein. This information is used to determine whether certain PTMs are likely to occur in the sequence.
    Protein sequences from other sources (e.g. word processor programs or other Web pages) can be copied and pasted directly into this field. If there are spaces in your sequence, these will be ignored.
    Note that the characters O and U are not considered and will give an error message. However, the residue J will be treated as either Ile or Leu, which have the same average and monoisotopic masses. The characters B, X, or Z (see Comment 5 of the Compute pI/Mw documentation) are accepted, but no masses are computed for peptides containing one or more of these characters.

  3. Peptide Masses
  4. Enter the experimentally measured peptide masses generated from the unknown protein in the « Enter a list of peptide masses...»text field, and separate them by spaces, tabs or new lines.
    DOC Note! You can copy a list of peptides from Excel or other applications and paste them directly into the text field.
    DOC Note! Avoid using peptide masses known to be from autodigestion of an enzyme (e.g. trypsin!), or other artefactual peaks (e.g. matrix peaks).
    Upload a .pkm, .dta or text file
    If the peptide mass fingerprinting data is stored in a file of one of the formats listed below, you can also upload the file directly from your computer:
    (1) Click on the on the «Browse...»button
    (2) Select the file containing the relevant peptide mass data and
    (3) Click on the «Open» button
    The peptide masses will then be extracted automatically from this file.

    Supported formats:
    (1) .pkm format, produced by the Voyager software of Perseptive Biosystems or the GRAMS software:
    OP=0
    
    Center X   Peak Y   Left X   Right X   Time X  Mass Difference  Name
    
    STD.Misc  Height   Left Y   Right Y   %Height,Width,%Area,%Quan,H/A
    
    833.319 2189  833.260  833.378  0.016  0  0
    
    C0.?  0  762  762
    
    854.843 5078  854.769  854.917  0.001  0  0
    
    C0.?  0  3453  3453
    
    863.419 5108  863.064  863.775  0.001  0  0
    
    C0.?  0  3567  3567
    
    872.402 12519  872.347  872.456  0.002  0  0
    
    C0.?  0  11417  11417
    
    874.395 6730  874.331  874.460  0.002  0  0
    
    C0.?  0  3559  3559
    
    887.786 5903  887.540  888.031  0.003  0  0
    
    C0.?  0  4131  4131
    
    898.475 3329  898.416  898.534  0.006  0  0
    
    C0.?  0  1377  1377
    
    904.366 7432  904.199  904.533  0.001  0  0
    
    C0.?  0  5596  5596
    
    955.300 2598  955.229  955.371  0.011  0  0
    
    C0.?  0  1089  1089
    
    973.845 16689  973.749  973.941  0.001  0  0
    
    
    All lines before the line containing ‘H/A’ are ignored.
    After that, only lines which do not contain any capital letters in the first 20 characters are retained. From the retained lines, the first column is interpreted as the mass.

    (2) Sequest format:
    1.001
    
    833.319 2189
    
    844.333 0.0
    
    854.843 5078
    
    863.419 5108
    
    872.402 12519
    
    874.395 6730
    
    887.786 5903
    
    898.475 3329
    
    899.555 0.0
    
    904.366 7432
    
    955.300 2598
    
    973.845 16689
    
    
    The first line is considered as a comment and is ignored.
    All subsequent lines are interpreted to contain a mass and an intensity (if any), and mass values are taken into account if the corresponding intensity is > 0.

    (3) Any user-created files can be uploaded if they correspond to the following rules: The first line does not contain any mass value (if it does, this mass value is ignored).
    Lines containing masses must start with the mass, and the first 20 characters must not contain any uppercase letters.
    >my file
    
    833.319
    
    854.843
    
    863.419
    
    872.402
    
    874.395
    
    887.786
    
    898.476
    
    904.366
    
    955.300
    
    973.845
    
    
    DOC Note! The upload option only works if you see a 'browse' button next to the text entry field. This should be the case for most recent web browser versions, e.g. Netscape 3.0 or higher, MS Internet Explorer 4.0 or higher.
    Users should avoid using peptide masses known to be from autodigestion of an enzyme (e.g. trypsin!), or other artifactual peaks (e.g. matrix peaks).

  5. Telling FindMod what to do
  6. FindMod can predict potential protein post-translational modifications and find potential single amino acid substitutions in peptides. The user can specify whether the program should detect only potential PTMs, only single amino acid substitutions or both.
    The experimental peptide masses will first be compared to theoretical unmodified peptides and to peptides modified as documented in UniProtKB/Swiss-Prot or by chemical modifications. The user can choose whether all peptide masses or only those that have not been attributed a theoretical peptide in this process should be examined for potential PTMs resp. single amino acid substitutions.

  7. User-defined post-translational modifications
  8. If you wish to take into account other post-translational modifications than those already predictable by FindMod, you can enter them here. For each of these PTMs, specify the name, its atomar composition and the amino acids this modification can be observed on.

  9. Ion mode (Masses as [M] or [M+H]+)
  10. You can enter the masses of your peptides as [M] or as [M+H]+, however you must select the appropriate button. If you select the [M+H]+ button, all peptide masses calculated from the database will have one proton (mass of 1 unit) added before matching with user-specified peptides.

  11. Isotopic resolution (average or monoisotopic masses)

  12. Chemical treatment of cysteine
  13. You can choose how you would like cysteines in a protein to be modified, before the theoretical masses of peptides are calculated. Experimentally, proteins are usually subjected to reduction and then alkylation with different reagents before they are used to generate peptides. If you would like the masses of unmodified cysteines in your peptides, select "nothing (in reduced form)". If you would like cysteines to be theoretically reduced and alkylated, specify the reagent to be used for alkylation.
    You have a choice of iodoacetamide (--> carboxyamidomethyl cysteine, Cys_CAM),iodoacetic acid (--> carboxymethyl cysteine, Cys_CM) and 4-vinyl pyridene (--> pyridyl-ethyl cysteine, Cys_PE).
    In that case, FindMod will consider both peptides with unmodified cysteines and peptides with modified cysteines.

    Acrylamide adducts:
    In proteins prepared by polyacrylamide gel electrophoresis, it can be common for cysteines to have reacted with free acrylamide monomers to form propionamide cysteine (Cys_PAM).
    The program will then modify the theoretical masses of Cys-containing peptides accordingly.

  14. Oxidation state of methionine
  15. You can request for all methionines in theoretical peptides to be oxidised. If this option is selected, the program will modify the theoretical masses of Met-containing peptides accordingly and consider both peptides with unmodified methionines and peptides with modified methionines. Note that proteins prepared by gel electrophoresis often show this modification.

  16. Mass tolerance
  17. Peptide masses can be specified in ppm (parts per million) or in Dalton.

  18. Digestion agent (enzyme)
  19. Specify the enzyme or chemical reagent that you used to generate your peptides (see the corresponding section in the PeptideMass instructions for the available enzymes and their cleavage rules).

  20. Missed cleavages
  21. In order to take into account partial cleavages, you can specify a maximum number (0, 1, 2 or 3) of missed cleavage sites to be allowed. If the maximum number of missed cleavages entered is 1, all concatenations of two adjoining peptides are also added to the list of theoretical peptides under consideration.

  22. Sorting of peptides in the result tables
  23. Here you can choose if you would like the peptides to be sorted by their mass (from smallest to largest) or by their chronological order in the protein.

  24. Send the result by e-mail
  25. FindMod results are displayed on-line in your browser window or can be sent by e-mail. If the results should be sent back to you by e-mail, tick the � Send the result by e-mail � box. In the � Your e-mail:� text field you should enter the correct e-mail address (e.g. name@unknown.ch) to where the results should be sent. The email option is recommended, in particular for queries with a high number of peptide masses. This avoids timeouts (�document contains no data�) which can occur for the on-line option: the browser interrupts the connection with the program if the search is not terminated after a certain time (usually about 3 minutes).
    Note that email results are sent in form of a html file, in exactly the same format as on-line, and that there is no loss of functionality compared to on-line display.

  26. Reset and Perform Buttons
  27. Once you have filled in the form according to your needs, press the button "Start FindMod". If you have made a mistake and would like all fields to be reset to their default values, press the Reset button.

FindMod Output

The results from FindMod are divided into a header and up to three tables.
The header contains information about the submitted protein: a link to the UniProtKB/Swiss-Prot or UniProtKB/TrEMBL entry and the description line (if the protein is in UniProtKB), pI and molecular weight.
Then the input parameters are listed, followed by an active link to PeptideMass. This allows the user to perform a theoretical cleavage of the protein of interest.
The tables report the peptides whose experimental masses match unmodified or modified theoretical digest products of the protein of interest:
Notes:
  1. A BLOSUM62 score is given for each suggested single AA substitution. This provides information about the probability of substitution: Lowest score: -4 (low probability of substitution), highest score: 11 (high probability of substitution).
  2. Potential single amino acid substitutions occurring on the cleavage site and substituting the AA for an AA after which the enzyme used for the digest does not cleave are not displayed.
  3. If the suggested AA substitution corresponds to a sequence variant or conflict as annotated in the UniProtKB/Swiss-Prot feature table, this substitution is highlighted in color (green background for that table line), and a hypertext link is provided to the corresponding annotated variant or conflict.
At the end of the output page the user will find a list of those entered matches which did not match in any of the above tables (if any).

Comments

  1. Signal Sequences, Propeptides & Transit Peptides
  2. Signal sequences, propeptides and transit sequences are all removed from proteins before cleavage rules are applied and peptide mass computed for the mature protein.

  3. Chains and Polypeptides that Produce Multiple Mature Proteins
  4. If there are known chains that are created from any database entry, these are considered as different polypeptides (e.g. FETUA_HUMAN).
    The same applies to any proteins which are known to form multiple mature peptides or proteins from a single initial polypeptide (e.g. COLI_HUMAN). In both cases FindMod comes up with an intermediary page asking the user to select the chain or peptide to which the entered experimental masses correspond.