FindPept - Documentation
FindPept allows the identification of peptides resulting from unspecific chemical or enzymatic cleavage of proteins. Although proteases usually display site specificity, under certain conditions the strict cleavage rules usually considered by tools such as PeptideMass can be violated. FindPept analyzes a series of masses obtained from mass spectrometry measurements and detects which masses do not correspond to peptides with predictable sequences dictated by cleavage rules. Those masses are then compared to those of the peptides that can theoretically be obtained by cleavage at any two positions of the protein sequence.
Furthermore, the program can take into account post-translational modifications of discrete mass (the same as in the FindMod tool), plus several chemical modifications resulting from the experimental treatment of the protein. FindPept therefore complements its companion program, FindMod, a tool used to predict post-translational modifications. As most of these modifications cannot be expected to be applied quantitatively, and thus can be present in any possible combination, the complexity of the problem addressed by FindPept can be staggering in computational terms.
Protein sequences can be provided by the user or can be a code name for a protein in the UniProt Knowledgebase (Swiss-Prot and TrEMBL). When proteins of interest are specified from UniProtKB/Swiss-Prot, the program considers the annotations for that protein in the UniProtKB/Swiss-Prot database, and uses these in order to generate a table of post-translational modifications. Additionally, user-defined modifications can be applied to any position or residue of choice. Many protein post-translational modifications which affect the masses of peptides can thus be taken into consideration.
Input parameters - Intermediate page - FindPept Output
Input parameters
-
Protein Sequence to be characterized
You should specify the sequence of the protein you would like to characterize and for which you have determined a set of peptide masses. If this protein is known in the UniProt Knowledgebase, enter the ID code (e.g. TKN1_HUMAN) or the protein accession number (e.g. P20366). If the protein is not known in the UniProt Knowledgebase, you can enter the sequence of your protein of interest, in single letter amino acid code, in either upper or lower case. In this case you may indicate expected post-translational modifications by entering their standard abbreviation surrounded by round brackets.
Protein sequences from other sources (e.g. word processor programs or other Web pages) can be copied and pasted directly into this field. If there are spaces in your sequence, these will be ignored.
Note that the characters B, X and Z are not considered and will give an error message. However, the residue J will be treated as either Ile or Leu, which have the same average and monoisotopic masses.
-
Peptide Masses
Enter the experimentally measured peptide masses generated from the unknown protein in the « Enter a list of peptide masses... » text field, and separate them by spaces, tabs or new lines.
You can copy a list of peptides from a spreadsheet or another application and paste them directly into the text field.
If possible, avoid using peptide masses known to be from autodigestion of an enzyme (e.g. trypsin!), or other artefactual peaks (e.g. matrix peaks).
Upload a .pkm, .dta or text file
If the peptide mass fingerprinting data is stored in a file of one of the formats listed below, you can also upload the file directly from your computer:
(1) Click on the on the « Browse...» button
(2) select the file containing the relevant peptide mass data and
(3) click on the « Open» button
The peptide masses will then be extracted automatically from this file.
Supported formats
-
Any user-created files can be uploaded if they correspond to the following rules:
- The first line does not contain any mass value (if it does, this mass value is ignored).
-
Lines containing masses must start with the mass, and the first 20 characters must not contain any uppercase letters.
Example:
> my file
833.319
854.843
863.419
872.402
874.395
-
Sequest format
Example:1.00 1 833.319 2189 844.333 0.0 854.843 5078 863.419 5108 872.402 12519
The first line is considered a comment and is ignored. All subsequent lines are interpreted to contain a mass and an intensity (if any), and mass values are taken into account if the corresponding intensity is > 0.
-
pkm format, produced by the Voyager software of Perseptive Biosystems or the GRAMS software
Example:OP=0 Center X Peak Y Left X Right X Time X Mass Difference Name STD.Misc Height Left Y Right Y %Height,Width,%Area,%Quan,H/A 833.319 2189 833.260 833.378 0.016 0 0 C 0.? 0 762 762 854.843 5078 854.769 854.917 0.001 0 0 C 0.? 0 3453 3453 863.419 5108 863.064 863.775 0.001 0 0 C 0.? 0 3567 3567 872.402 12519 872.347 872.456 0.002 0 0 C 0.? 0 11417 11417 874.395 6730 874.331 874.460 0.002 0 0
All lines before the line containing 'H/A' are ignored. After that, only lines which do not contain any capital letters in the first 20 characters are retained. From the retained lines, the first column is interpreted as the mass.
-
Any user-created files can be uploaded if they correspond to the following rules:
-
User-defined post-translational modifications
If you wish to take into account other post-translational modifications than those already considered by the program, you can enter them here. For each of these PTMs, specify the name, its atomic composition and the positions this modification can be observed on. Some different ways to indicate a position are indicated in the table below (combinations are allowed) :
Position Meaning 18 residue #18 only E D all glutamate and aspartate residues < N-terminus of each peptide > C-terminus of each peptide K < Lysine and N-terminus of each peptide
-
Chemical treatment of cysteine
You can choose how you would like cysteines in a protein to be modified, before the theoretical masses of peptides are calculated. Experimentally, proteins are usually subjected to reduction and then alkylation with different reagents before they are used to generate peptides. If you expect all cysteines to be unmodified in your peptides, select "nothing (in reduced form)". If you would like cysteines to be theoretically reduced and alkylated, specify the reagent you used for alkylation. You have a choice of iodoacetamide (--> carboxyamidomethyl cysteine, Cys_CAM), iodoacetic acid (--> carboxymethyl cysteine, Cys_CM) and 4-vinyl pyridene (--> pyridyl-ethyl cysteine, Cys_PE).
Note that unlike in other tools, chemical reagents (except acrylamide) affect all cysteine residues.
Acrylamide adducts
In proteins prepared by polyacrylamide gel electrophoresis, it can be common for cysteines to have reacted with free acrylamide monomers to form propionamide cysteine (Cys_PAM). The program will then modify the theoretical masses of Cys-containing peptides accordingly.
-
Oxidation state of methionine
You can request for all methionines in theoretical peptides to be oxidized. If this option is selected, the program will modify the theoretical masses of Met-containing peptides accordingly and consider both peptides with unmodified methionines and peptides with modified methionines. Note that proteins prepared by gel electrophoresis often show this modification.
-
Esterification of acidic and C-terminal residues
If your peptides have been treated with methanol to form methyl esters of the carboxylate residues of glutamate and aspartate side chains and the carboxy-terminal end, check this option. The mass of a CH 2 group will be added for each free carboxylate group in the peptide. However this modification will not be indicated in the table of matching peptides to avoid clutter.
-
N-Acetylation and N-Formylation
If you expect the N-terminal residue of your protein to be possibly acetylated or formylated in your biological sample, check this option. The program will consider both the free amino-terminal group and a modified terminus as possibilities in the calculated peptides.
-
Ion mode (Masses as [M], [M+H]+ or [M-H]-)
You can enter the masses of your peptides as [M], [M+H] + or [M-H] -, but you must select the appropriate button. If you select the [M+H] + button, all peptide masses calculated from the database will have one proton (mass of 1 unit) added before matching with user-specified peptides. Conversely, selecting the [M-H] - button will have one proton mass removed from the database mass.
-
Isotopic resolution (average or monoisotopic masses)
Please select the mode used in the MS experiment.
-
Mass tolerance
Tolerance (the maximum mass difference between the experimental and calculated masses used to assess a match) can be specified in ppm (parts per million) or in daltons.
-
Sorting of peptides in the result tables
Here you can choose if you would like the peptides to be sorted by their mass (from smallest to largest) or by their positional order in the protein.
-
Digestion agent (enzyme)
Specify the enzyme or chemical reagent that you used to generate your peptides (see the corresponding section in the PeptideMass instructions for the available enzymes and their cleavage rules) if you wish that cleavage sites that obey cleavage rules at either end be highlighted in the results with a red slash and that peptides that obey them at both ends be displayed in a separate table. If you check the Exclude masses that match specific cleavage by the enzyme checkbox, the corresponding masses will not be taken in account for unspecific cleavage. This is recommended since specific cleavage is usually much more likely than unspecific cleavage.
-
Enzyme source
The drop-down list adjacent to that used to specify the enzyme lists the most common sources and variants of enzymes experimentally used to digest peptides that are listed in Swiss-Prot/TrEMBL. If you select one, its sequence will be automatically submitted to PeptideMass to check if any user-supplied masses correspond to fragments obtained by the autolysis (i.e., self-digestion) of the protease with up to 5 missed cleavages, and such fragments will be displayed in a separate table. If you check the Exclude masses that match autolytic cleavage of the enzyme the corresponding masses will not be taken in account for the cleavage of the studied protein.
-
Send the result by e-mail
FindPept results are displayed on-line in your browser window or can be sent by e-mail. If the results should be sent to you by e-mail, tick the "Send the result by e-mail" box. In the "Your e-mail:" text field you should enter the correct e-mail address (e.g. name@example.com) to where the results should be sent. The email option may be a necessity for queries with a high number of peptide masses or of post-translational modification. This avoids timeouts ("document contains no data" error messages) which may occur for the on-line option: the browser interrupts the connection with the program if the search is not terminated after a certain time (usually about 3 minutes).
Note that email results are sent in form of a html file, in exactly the same format as on-line, and that there is no loss of functionality compared to on-line display.
-
Reset and Perform Buttons
Once you have filled in the form according to your needs, press the "Start FindPept" button. If you have made a mistake and would like all fields to be reset to their default values, press the Reset button.
Intermediate page
If the user has submitted the accession number of a Swiss-Prot/TrEMBL entry that contains :
- several chains or polypeptides (e.g. P02765, P01189),
- and/or an initiator methionine residue at the beginning of the sequence (e.g. P01012),
The numbering of the residues will be the same as in the original Swiss-Prot/TrEMBL entry. If an initiator methionine is added before the beginning of the sequence, it will be attributed the position 0.
FindPept Output
The results from FindPept are divided into a header and up to seven tables. Each table is displayed only if matching peptides/PTMs have been found in the given category.
The header contains information about the submitted protein: a link to the Swiss-Prot/TrEMBL entry and the description line (if the protein is in Swiss-Prot/TrEMBL), pI and molecular weight. Then the input parameters are listed, followed by an active link to PeptideMass, to allow the user to perform a theoretical cleavage of the protein of interest, FindMod, to predict possible post-translational modifications as an alternative cause of unexpected experimental peptide masses, or the original FindPept form to change some parameters.
The tables report the peptides whose experimental masses match unmodified or modified theoretical digest products of the protein of interest.
Post-Translational/Artefactual Modifications for the PROTEASE: |
Lists the PTMs applied to the Protease that can be found in the peptides in the table of the peptides resulting from protease autolysis. |
Post-Translational/Artefactual Modifications for the PROTEIN: |
Lists the PTMs applied to the studied protein that can be found in the peptides resulting from the cleavage of the protein. |
Masses resulting from possible contaminants | ||||||||||||
Lists the masses that correspond to the specific cleavage of a number of human keratins that are often encountered as contaminants in biological samples handled in the laboratory. The considered contaminants are :
|
Peptides resulting from protease autolysis : |
Lists the peptides obtained by specific self-digestion of the protease that match the user masses. |
Matching peptides for specific cleavage: |
Lists the peptides obtained by digestion of the studied protein for which both ends are either sites of specific cleavage by the protease, or the extremities of the original peptide. |
Matching peptides for unspecific cleavage: |
Lists the other peptides obtained by digestion of the studied protein that match the user masses. |
Unmatched masses |
Lists the masses that could not be identified by the program. |
Key to the tables
Post-Translational/Artefactual Modifications | |
---|---|
Count | lists the number of residues potentially carrying the PTM. This is relevant for PTMs affecting all residues of a particular amino acid (see Position entry). |
Nature | The nature of the PTM : its abbreviation in the code used by FindMod or an artefactual modification on cysteine and methionine residues and carboxylate groups. |
Position | If an integer, the position on which the PTM occurs. If a letter, the amino acid on all instances of which the PTM occurs (for artefactual modifications). C-ter and N-ter means that it is expected in the C- or N-terminus of each peptide. |
Source |
The source where the information about the PTM was obtained, any of :
|
Occurrence |
If two (or more) PTMs occur on the same residue, the program excludes the (chemically impossible) combination but takes into account all the PTMs in the combinatorial calculation, plus the unmodified residue, except if one PTM is set to mandatory. In this case, the unmodified residue is not taken into account, but the mandatory PTM does not have a precedence over optional PTMs. |
Mass | The mass of the PTM, i.e. the mass difference between the modified residue and the unmodified residue. This does therefore not include the mass of the amino acid part. |
Description | For Feature table entries, this is the content of the description field and often contains interesting information to decide whether the PTM is plausible or not. For User-defined PTMs, the description is supplied by the user. For other PTMs, this is the chemical nature of the modification. |
Matching Peptides | ||||||
---|---|---|---|---|---|---|
User mass | The user mass supplied in the masses field in the uploaded file to which the DB mass corresponds within the tolerance bounds. If a peptide matches several user masses, it is reported in several rows. | |||||
DB mass | The mass of the peptide, including the post-translational and artefactual modifications, calculated using the masses database. | |||||
mass | The mass difference between the user mass and DB mass, supplied in the same unit (Da or ppm) as the tolerance. One ppm is the millionth part of the DB mass. | |||||
peptide | the sequence of the peptide. The residues immediately before and after the sequence are indicated in gray and between round brackets but not taken into account in the calculation. If either side of the peptide corresponds to a site of specific cleavage, it is denoted by a red slash. Residues carrying a PTM are underlined, with the exception of esterification which is not indicated to reduce clutter. | |||||
position | The position of the peptide in the sequence, supplied as the first and last residue. This is also a link to the FT detail viewer if a Swiss-Prot/TrEMBL entry was supplied. The position 0 is the initiator methionine if the user indicated that it should be added. | |||||
modifications | The list of the PTMs applied to the peptide, in the order in which the underlined residues are indicated in the peptide sequence. For their description, refer to the PTM table above. | |||||
missed cleavages | The number of missed cleavage sites within the sequence. |