Expasy logo

Compute pI/Mw

Compute pI/Mw - Documentation

The following is an excerpt from the chapter
Protein Identification and Analysis Tools on the Expasy Server;
Gasteiger E., Hoogland C., Gattiker A., Duvaud S., Wilkins M.R., Appel R.D., Bairoch A.;
(In) John M. Walker (ed): The Proteomics Protocols Handbook, Humana Press (2005).
pp. 571-607
Full text - Copyright Humana Press.
This tool calculates the estimated pI and Mw of a specified Swiss-Prot/TrEMBL entry or a user-entered AA sequence. These parameters are useful if you want to know the approximate region of a 2-D gel where a protein may be found.

Using Compute pI / Mw


To use the program, enter one or more Swiss-Prot/TrEMBL identification names (e.g. LACB_BOVIN) or accession numbers (e.g. P02754) into the text field, and select the "click here to compute pI/Mw" button.

If one entry is specified, you will be asked to specify the protein's domain of interest for which the pI and mass should be computed. The domain can be selected from the hypertext link of features shown, if any, or by numerically specifying the domain start and end points.

If more than one Swiss-Prot/TrEMBL identification name is entered, all proteins will automatically be processed to their mature forms, and pI and Mw values calculated for the chains or peptides. If only fragments of the protein of interest are available in the database, no result will be given and an error message will be shown to highlight that the pI and mass cannot be returned accurately. Where database entries have signal sequences or transit peptides of unknown length (e.g. A1CGA8), an average length signal sequence or transit peptide is removed before the pI and mass computation is done. In Swiss-Prot release 35, the average signal sequence length is 22 amino acids for eukaryotes and viruses, 26 amino acids for prokaryotes and bacteriophages, and 30 for archaebacteria. Transit peptides have an average length of 55 amino acids in chloroplasts, 34 for mitochondria, 29 for microbodies, and 51 for cyanelles.

If your protein of interest is not in Swiss-Prot/TrEMBL, you can enter an AA sequence in standard single letter AA code into the text field, and select the "click here to compute pI/Mw" button. The predicted pI and Mw of your sequence will then be displayed. A typical output from the program is shown in figure 2.

Alternatively to the verbose html output, the result for a list of Swiss-Prot/TrEMBL entries can also be retrieved in a numerical format, with minimal documentation. A file containing 4 columns,
ID   AC   pI   Mw
is generated and can be loaded into an external application, such as a spreadsheet program.

	            ASNA_MOUSE_1    O54984        4.81    38691.61

	            ARSA_MOUSE_1    P50428        5.50    52145.20

	            ARSB_MOUSE_1    P50429        6.39    55458.91

	            ARX_MOUSE_1     O35085        5.14    58490.34

	            ARY1_MOUSE_1    P50294        5.10    33713.36

	            ARY2_MOUSE_1    P50295        5.63    33701.41

	            ARY3_MOUSE_1    P50296        6.07    33685.69

	            ASAH1_MOUSE_1   Q9WV54        6.11    13797.05

	            ASAH1_MOUSE_2   Q9WV54        8.87    29017.27

	            ASAH1_MOUSE_1   Q9WV54        6.11    13797.05

	            ASAH1_MOUSE_2   Q9WV54        8.87    29017.27

	            ASCL1_MOUSE_1   Q02067        8.56    24740.54

	            ASM_MOUSE_1     Q04519        6.48    65060.34

	            ASNS_MOUSE_1    Q61024        6.12    64151.52

	            ASPG_MOUSE_1    Q64191        5.15    19681.95

	            ASPG_MOUSE_2    Q64191        7.61    14809.84

	            ASPX_MOUSE_1    P50289        4.42    26147.04

            
If pI and MW cannot be computed, a value of "0.00" appears in the Mw column, and the reason for this is displayed in the pI column in form of a code, the meaning of which is as follows:

FRAGMENT Incomplete $chain_or_peptide: pI/Mw cannot be computed
UNDEFINED Unknown start- or endpoints: pI/Mw cannot be computed
XXX Sequence contains several consecutive undefined AA: pI/Mw cannot be computed

If a Swiss-Prot/TrEMBL entry has one or more mature chains/peptides documented, this is indicated by "_1", "_2", etc. appended to the ID. An appended "_1", "_2", etc. indicates that the considered sequence is that corresponding to the first, second etc. CHAIN or PEPTIDE documented in the feature table.

Comments

  1. Protein pI is calculated using pK values of amino acids described in Bjellqvist et al., which were defined by examining polypeptide migration between pH 4.5 to 7.3 in an immobilised pH gradient gel environment with 9.2M and 9.8M urea at 15°C or 25°C. Prediction of protein pI for highly basic proteins is yet to be studied and it is possible that current Compute pI/Mw predictions may not be adequate for this purpose.
  2. The buffer capacity of a protein will affect the accuracy of its
  3. predicted pI, with poor buffer capacity leading to greater error in prediction (Bjellqvist et al.). Because of this, pI predictions for small proteins can be problematic.
  4. Protein Mw is calculated by the addition of average isotopic masses of amino acids in the protein and the average isotopic mass of one water molecule. Molecular weight values are given in Dalton (Da).
  5. This program does not account for the effects of post-translational modifications, thus modified proteins on a 2-D gel may migrate to a position quite different to that predicted. Protein glycosylation in particular can affect protein migration in both pI and Mw dimensions. Note however that the "GET REGION ON 2D PAGE" function in SWISS-2DPAGE (accessed by selecting a "GET REGION ON 2D PAGE" hypertext link from a Swiss-Prot entry) uses the Compute pI/Mw algorithm to highlight a region on a 2-D gel to where an unmodified protein should run, and suggests a region where the modified protein might be found if it has modifications documented in the Swiss-Prot database.
  6. In addition to the standard one-letter-codes for the 20 amino acids, the 2 non-standard amino acids (Selenocysteine and Pyrrolysine), the characters B, Z and X are accepted:
    
    	                   B   Asx   Aspartic acid or Asparagine   
    
    	                   Z   Glx   Glutamine or Glutamic acid
    
    	                   X   Xaa   Any amino acid
    
                        
    The mass values used for these residues are mean values of the corresponding masses weighted by their respective frequencies observed in the whole Swiss-Prot protein sequences (check the frequencies observed in the current Swiss-Prot release).
    B, Z and X are presumed to have pK values of 0, i.e. a pI of 5.52.