Home  |  Contact

FindPept

FindPept allows the identification of peptides resulting from unspecific chemical or enzymatic cleavage of proteins. Although proteases usually display site specificity, under certain conditions the strict cleavage rules usually considered by tools such as PeptideMass can be violated. FindPept analyzes a series of masses obtained from mass spectrometry measurements and detects which masses do not correspond to peptides with predictable sequences dictated by cleavage rules. Those masses are then compared to those of the peptides that can theoretically be obtained by cleavage at any two positions of the protein sequence.

Furthermore, the program can take into account post-translational modifications of discrete mass (the same as in the FindMod tool), plus several chemical modifications resulting from the experimental treatment of the protein. FindPept therefore complements its companion program, FindMod, a tool used to predict post-translational modifications. As most of these modifications cannot be expected to be applied quantitatively, and thus can be present in any possible combination, the complexity of the problem addressed by FindPept can be staggering in computational terms.

Protein sequences can be provided by the user or can be a code name for a protein in the UniProt Knowledgebase (Swiss-Prot and TrEMBL). When proteins of interest are specified from UniProtKB/Swiss-Prot, the program considers the annotations for that protein in the UniProtKB/Swiss-Prot database, and uses these in order to generate a table of post-translational modifications. Additionally, user-defined modifications can be applied to any position or residue of choice. Many protein post-translational modifications which affect the masses of peptides can thus be taken into consideration.

 


Input parameters

  1. Protein Sequence to be characterized
  2. You should specify the sequence of the protein you would like to characterize and for which you have determined a set of peptide masses. If this protein is known in the UniProt Knowledgebase, enter the ID code (e.g. TKN1_HUMAN) or the protein accession number (e.g. P20366). If the protein is not known in the UniProt Knowledgebase, you can enter the sequence of your protein of interest, in single letter amino acid code, in either upper or lower case. In this case you may indicate expected post-translational modifications by entering their standard abbreviation surrounded by round brackets.

    Protein sequences from other sources (e.g. word processor programs or other Web pages) can be copied and pasted directly into this field. If there are spaces in your sequence, these will be ignored.
    Note that the characters B, X and Z are not considered and will give an error message. However, the residue J will be treated as either Ile or Leu, which have the same average and monoisotopic masses.

  3. Peptide Masses
  4. Enter the experimentally measured peptide masses generated from the unknown protein in the « Enter a list of peptide masses... » text field, and separate them by spaces, tabs or new lines.
    DOC Note! You can copy a list of peptides from a spreadsheet or another application and paste them directly into the text field.
    DOC Note! If possible, avoid using peptide masses known to be from autodigestion of an enzyme (e.g. trypsin!), or other artefactual peaks (e.g. matrix peaks).
    Upload a .pkm, .dta or text file
    If the peptide mass fingerprinting data is stored in a file of one of the formats listed below, you can also upload the file directly from your computer:
    (1) Click on the on the « Browse...» button
    (2) select the file containing the relevant peptide mass data and
    (3) click on the « Open» button
    The peptide masses will then be extracted automatically from this file.
    Supported formats:
    .pkm format, produced by the Voyager software of Perseptive Biosystems or the GRAMS software
    Example:

    
      OP=0
    
      Center X Peak Y Left X Right X Time X Mass Difference Name
    
      STD.Misc Height Left Y Right Y %Height,Width,%Area,%Quan,H/A
    
      833.319 2189 833.260 833.378 0.016 0 0 
    
      C 0.? 0 762 762 
    
      854.843 5078 854.769 854.917 0.001 0 0 
    
      C 0.? 0 3453 3453 
    
      863.419 5108 863.064 863.775 0.001 0 0 
    
      C 0.? 0 3567 3567 
    
      872.402 12519 872.347 872.456 0.002 0 0 
    
      C 0.? 0 11417 11417 
    
      874.395 6730 874.331 874.460 
    
        0.002 0 0 
    
      
    All lines before the line containing 'H/A' are ignored. After that, only lines which do not contain any capital letters in the first 20 characters are retained. From the retained lines, the first column is interpreted as the mass.
    Sequest format
    Example:
    
      1.00 1 
    
      833.319 2189
    
      844.333 0.0
    
      854.843 5078
    
      863.419 5108
    
      872.402 12519
    
      
    The first line is considered a comment and is ignored. All subsequent lines are interpreted to contain a mass and an intensity (if any), and mass values are taken into account if the corresponding intensity is > 0.
    Any user-created files can be uploaded if they correspond to the following rules:
    • The first line does not contain any mass value (if it does, this mass value is ignored).
    • Lines containing masses must start with the mass, and the first 20 characters must not contain any uppercase letters.
    • Example:
      > my file
      833.319
      854.843
      863.419
      872.402
      874.395

    DOC Note! The upload option only works if you see a 'browse' button next to the text entry field. This should be the case for most recent web browser versions, e.g. Netscape 3.0 or higher, MS Internet Explorer 4.0 or higher.
    Users should, if possible, avoid using peptide masses known to be from autodigestion of an enzyme (e.g. trypsin!), or other artifactual peaks (e.g. matrix peaks).

  5. User-defined post-translational modifications
  6. If you wish to take into account other post-translational modifications than those already considered by the program, you can enter them here. For each of these PTMs, specify the name, its atomic composition and the positions this modification can be observed on. Some different ways to indicate a position are indicated in the table below (combinations are allowed) :
    PositionMeaning
    18residue #18 only
    E Dall glutamate and aspartate residues
    <N-terminus of each peptide
    >C-terminus of each peptide
    K <Lysine and N-terminus of each peptide

  7. Chemical treatment of cysteine
  8. You can choose how you would like cysteines in a protein to be modified, before the theoretical masses of peptides are calculated. Experimentally, proteins are usually subjected to reduction and then alkylation with different reagents before they are used to generate peptides. If you expect all cysteines to be unmodified in your peptides, select "nothing (in reduced form)". If you would like cysteines to be theoretically reduced and alkylated, specify the reagent you used for alkylation. You have a choice of iodoacetamide (--> carboxyamidomethyl cysteine, Cys_CAM), iodoacetic acid (--> carboxymethyl cysteine, Cys_CM) and 4-vinyl pyridene (--> pyridyl-ethyl cysteine, Cys_PE).

    Note that unlike in other tools, chemical reagents (except acrylamide) affect all cysteine residues.

    Acrylamide adducts

    In proteins prepared by polyacrylamide gel electrophoresis, it can be common for cysteines to have reacted with free acrylamide monomers to form propionamide cysteine (Cys_PAM). The program will then modify the theoretical masses of Cys-containing peptides accordingly.

  9. Oxidation state of methionine
  10. You can request for all methionines in theoretical peptides to be oxidized. If this option is selected, the program will modify the theoretical masses of Met-containing peptides accordingly and consider both peptides with unmodified methionines and peptides with modified methionines. Note that proteins prepared by gel electrophoresis often show this modification.

  11. Esterification of acidic and C-terminal residues
  12. If your peptides have been treated with methanol to form methyl esters of the carboxylate residues of glutamate and aspartate side chains and the carboxy-terminal end, check this option. The mass of a CH2 group will be added for each free carboxylate group in the peptide. However this modification will not be indicated in the table of matching peptides to avoid clutter.

  13. N-Acetylation and N-Formylation
  14. If you expect the N-terminal residue of your protein to be possibly acetylated or formylated in your biological sample, check this option. The program will consider both the free amino-terminal group and a modified terminus as possibilities in the calculated peptides.

  15. Ion mode (Masses as [M], [M+H]+ or [M-H]-)
  16. You can enter the masses of your peptides as [M], [M+H]+ or [M-H]-, but you must select the appropriate button. If you select the [M+H]+ button, all peptide masses calculated from the database will have one proton (mass of 1 unit) added before matching with user-specified peptides. Conversely, selecting the [M-H]- button will have one proton mass removed from the database mass.

  17. Isotopic resolution (average or monoisotopic masses)
  18. Please select the mode used in the MS experiment.

  19. Mass tolerance
  20. Tolerance (the maximum mass difference between the experimental and calculated masses used to assess a match) can be specified in ppm (parts per million) or in daltons.

  21. Sorting of peptides in the result tables
  22. Here you can choose if you would like the peptides to be sorted by their mass (from smallest to largest) or by their positional order in the protein.

  23. Digestion agent (enzyme)
  24. Specify the enzyme or chemical reagent that you used to generate your peptides (see the corresponding section in the PeptideMass instructions for the available enzymes and their cleavage rules) if you wish that cleavage sites that obey cleavage rules at either end be highlighted in the results with a red slash and that peptides that obey them at both ends be displayed in a separate table. If you check the Exclude masses that match specific cleavage by the enzyme checkbox, the corresponding masses will not be taken in account for unspecific cleavage. This is recommended since specific cleavage is usually much more likely than unspecific cleavage.

  25. Enzyme source
  26. The drop-down list adjacent to that used to specify the enzyme lists the most common sources and variants of enzymes experimentally used to digest peptides that are listed in Swiss-Prot/TrEMBL. If you select one, its sequence will be automatically submitted to PeptideMass to check if any user-supplied masses correspond to fragments obtained by the autolysis (i.e., self-digestion) of the protease with up to 5 missed cleavages, and such fragments will be displayed in a separate table. If you check the Exclude masses that match autolytic cleavage of the enzyme the corresponding masses will not be taken in account for the cleavage of the studied protein.

  27. Send the result by e-mail
  28. FindPept results are displayed on-line in your browser window or can be sent by e-mail. If the results should be sent to you by e-mail, tick the � Send the result by e-mail � box. In the � Your e-mail: � text field you should enter the correct e-mail address (e.g. name@example.com) to where the results should be sent. The email option may be a necessity for queries with a high number of peptide masses or of post-translational modification. This avoids timeouts (�document contains no data� error messages) which may occur for the on-line option: the browser interrupts the connection with the program if the search is not terminated after a certain time (usually about 3 minutes).
    Note that email results are sent in form of a html file, in exactly the same format as on-line, and that there is no loss of functionality compared to on-line display.

  29. Reset and Perform Buttons
  30. Once you have filled in the form according to your needs, press the "Start FindPept" button. If you have made a mistake and would like all fields to be reset to their default values, press the Reset button.

Intermediate page

If the user has submitted the accession number of a Swiss-Prot/TrEMBL entry that contains :

an intermediate page will appear, requiring the selection either of a single chain, of the uncleaved precursor, or of the sequence with the initiator methionine added. If you suspect that the protein you have used experimentally has not matured as expected in vivo, you may opt to consider the precursor sequence for the analysis. Also check this option if your sample may contain traces of a cleaved sequence, as peptides that span several chains will be indicated with a red asterisk in the result table to warn you that they can result only from the digestion of the uncleaved precursor.

The numbering of the residues will be the same as in the original Swiss-Prot/TrEMBL entry. If an initiator methionine is added before the beginning of the sequence, it will be attributed the position 0.


FindPept Output

The results from FindPept are divided into a header and up to seven tables. Each table is displayed only if matching peptides/PTMs have been found in the given category.

The header contains information about the submitted protein: a link to the Swiss-Prot/TrEMBL entry and the description line (if the protein is in Swiss-Prot/TrEMBL), pI and molecular weight. Then the input parameters are listed, followed by an active link to PeptideMass, to allow the user to perform a theoretical cleavage of the protein of interest, FindMod, to predict possible post-translational modifications as an alternative cause of unexpected experimental peptide masses, or the original FindPept form to change some parameters.

The tables report the peptides whose experimental masses match unmodified or modified theoretical digest products of the protein of interest.


Post-Translational/Artefactual Modifications for the PROTEASE:
Lists the PTMs applied to the Protease that can be found in the peptides in the table of the peptides resulting from protease autolysis.

Post-Translational/Artefactual Modifications for the PROTEIN:
Lists the PTMs applied to the studied protein that can be found in the peptides resulting from the cleavage of the protein.

Masses resulting from possible contaminants
Lists the masses that correspond to the specific cleavage of a number of human keratins that are often encountered as contaminants in biological samples handled in the laboratory. The considered contaminants are :
P04264 K2C1_HUMAN Keratin, type II cytoskeletal 1 (Cytokeratin 1) [from skin]
P35908 K22E_HUMAN Keratin, type II cytoskeletal 2 epidermal (Cytokeratin 2e) [from dandruff]
P35527 K1C9_HUMAN Keratin, type I cytoskeletal 9 (Cytokeratin 9) [from skin]
P13645 K1C10_HUMAN Keratin, type I cytoskeletal 10 (Cytokeratin 10) [from dandruff]
Reference: Parker K.C. et al. Identification of yeast proteins from two-dimensional gels: working out spot cross-contamination. Electrophoresis 1998; 19(11):1920-1932. PubMed: 9740052.

Peptides resulting from protease autolysis :
Lists the peptides obtained by specific self-digestion of the protease that match the user masses.

Matching peptides for specific cleavage:
Lists the peptides obtained by digestion of the studied protein for which both ends are either sites of specific cleavage by the protease, or the extremities of the original peptide.

Matching peptides for unspecific cleavage:
Lists the other peptides obtained by digestion of the studied protein that match the user masses.

Unmatched masses
Lists the masses that could not be identified by the program.

Key to the tables

Post-Translational/Artefactual Modifications
Count lists the number of residues potentially carrying the PTM. This is relevant for PTMs affecting all residues of a particular amino acid (see Position entry).
Nature The nature of the PTM : its abbreviation in the code used by FindMod or an artefactual modification on cysteine and methionine residues and carboxylate groups.
Position If an integer, the position on which the PTM occurs. If a letter, the amino acid on all instances of which the PTM occurs (for artefactual modifications). C-ter and N-ter means that it is expected in the C- or N-terminus of each peptide.
Source

The source where the information about the PTM was obtained, any of :

  • Feature Table : the PTM was entered in the Feature Table of a UniProtKB/Swiss-Prot entry. Consult the particular entry for additional details and references.
  • User input : the PTM was entered between round brackets directly in a sequence that was cut and pasted in the sequence field.
  • Artefactual : the chemical modification was activated by checking the appropriate checkboxes pertaining to cysteine alkylation, methionine oxidation and carboxyl group esterification.
  • User defined : the PTM was defined in the table by means of an abbreviation, its atomic composition and its position(s) in the sequence.
Occurrence
  • Optional : the PTM can, but must not, occur. The program takes into account all possible combinations of optional PTMs to determine the maximal possible number of variants of the same sequence that only differ in their PTMs and match the user-input masses. This is the setting used for all PTMs read in the Feature Table or supplied by the user.
  • Mandatory : the PTM occurs on the position(s) reported in the Position field, except in case an Optional modification is defined on the same residue.

If two (or more) PTMs occur on the same residue, the program excludes the (chemically impossible) combination but takes into account all the PTMs in the combinatorial calculation, plus the unmodified residue, except if one PTM is set to mandatory. In this case, the unmodified residue is not taken into account, but the mandatory PTM does not have a precedence over optional PTMs.

Mass The mass of the PTM, i.e. the mass difference between the modified residue and the unmodified residue. This does therefore not include the mass of the amino acid part.
Description For Feature table entries, this is the content of the description field and often contains interesting information to decide whether the PTM is plausible or not. For User-defined PTMs, the description is supplied by the user. For other PTMs, this is the chemical nature of the modification.

Matching Peptides
User mass The user mass supplied in the masses field in the uploaded file to which the DB mass corresponds within the tolerance bounds. If a peptide matches several user masses, it is reported in several rows.
DB mass The mass of the peptide, including the post-translational and artefactual modifications, calculated using the masses database.
deltamass The mass difference between the user mass and DB mass, supplied in the same unit (Da or ppm) as the tolerance. One ppm is the millionth part of the DB mass.
peptide the sequence of the peptide. The residues immediately before and after the sequence are indicated in gray and between round brackets but not taken into account in the calculation. If either side of the peptide corresponds to a site of specific cleavage, it is denoted by a red slash. Residues carrying a PTM are underlined, with the exception of esterification which is not indicated to reduce clutter.
position The position of the peptide in the sequence, supplied as the first and last residue. This is also a link to the FT detail viewer if a Swiss-Prot/TrEMBL entry was supplied. The position 0 is the initiator methionine if the user indicated that it should be added.
modifications The list of the PTMs applied to the peptide, in the order in which the underlined residues are indicated in the peptide sequence. For their description, refer to the PTM table above.
missed cleavages The number of missed cleavage sites within the sequence.