ProtParam - Documentation
- The following is an excerpt from the chapter:
-
Protein Identification and Analysis Tools on the Expasy Server;
Gasteiger E., Hoogland C., Gattiker A., Duvaud S., Wilkins M.R., Appel R.D., Bairoch A.;
(In) John M. Walker (ed): The Proteomics Protocols Handbook, Humana Press (2005).
pp. 571-607 Full text - Copyright Humana Press.
Using ProtParam
ProtParam computes various physico-chemical properties that can be deduced from a protein sequence. No additional information is required about the protein under consideration. The protein can either be specified as a UniProtKB accession number (AC) or ID, or in form of a raw sequence in which case space and numbers are ignored.
If you provide a UniProtKB AC/ID, you will be prompted with an intermediary page that allows you to select the portion of the sequence on which you would like to perform the analysis. The choice includes a selection of mature chains or peptides and domains compiled from the feature table of the UniProtKB entry (which can be chosen by clicking on the positions), as well as the possibility to enter start and end position in two boxes. By default (i.e. if you leave the two boxes empty) the complete sequence will be analyzed.
Note: It is not possible to specify post-translational modification for your protein, nor will ProtParam know whether your mature protein forms dimers or multimers. If you do know that your protein forms a dimer, you may just duplicate your sequence (i.e. append a second copy of the sequence to the first), as all computations performed by ProtParam are based on either compositional data, or on the N-terminal amino acid.
The calculated parameters
The parameters computed by ProtParam include the molecular weight, theoretical pI, amino acid composition, atomic composition, extinction coefficient, estimated half-life, instability index, aliphatic index and grand average of hydropathicity (GRAVY). Molecular weight and theoretical pI are calculated as in Compute pI/Mw. The amino acid and atomic compositions are self-explanatory. All the other parameters will be explained below.
Extinction coefficients
The extinction coefficient indicates how much light a protein absorbs at a certain wavelength. It is useful to have an estimation of this coefficient for following a protein which a spectrophotometer when purifying it.
It has been shown [ 1c] that it is possible to estimate the molar extinction coefficient of a protein from knowledge of its amino acid composition. From the molar extinction coefficient of tyrosine, tryptophan and cystine (cysteine does not absorb appreciably at wavelengths >260 nm, while cystine does) at a given wavelength, the extinction coefficient of the native protein in water can be computed using the following equation:
E(Prot) = Numb(Tyr)*Ext(Tyr) + Numb(Trp)*Ext(Trp) + Numb(Cystine)*Ext(Cystine)where (for proteins in water measured at 280 nm): Ext(Tyr) = 1490, Ext(Trp) = 5500, Ext(Cystine) = 125;
The absorbance (optical density) can be calculated using the following formula:
Absorb(Prot) = E(Prot) / Molecular_weight
Two values are produced by ProtParam based on the above equations, both for proteins measured in water at 280 nm. The first one shows the computed value based on the assumption that all cysteine residues appear as half cystines (i.e. all pairs of Cys residues form cystines), and the second one assuming that no cysteine appears as half cystine (i.e. assuming all Cys residues are reduced). Experience shows that the computation is quite reliable for proteins containing Trp residues, however there may be more than 10% error for proteins without Trp residues.
Note: Cystine is the amino acid formed when of a pair of cysteine molecules are joined by a disulfide bond.
Important note for previous users of ProtParam: Changed algorithm for Extinction coefficient (2005)
We have chosen to calculate protein extinction coefficients using the Edelhoch method [1b], but with the extinction coefficients for Trp and Tyr determined by Pace [1a] et al. Edelhoch determined extinction coefficients for Trp and Tyr by using blocked amino acid analogs as model substances to represent the situation in proteins. N-acetyl-L-tryptophanamide and glycyl-L-tyrosylglycine were used for Trp and Tyr, respectively and the values were determined in pH 6.5, 6.0 M guanidium hydrochloride, 0.02 M phosphate buffer.
Gill and von Hippel [1c] found that these values valid for calculating the extinction coefficients of the denatured protein with good approximation could also be used to calculate the extinction coefficients of the native protein. The conclusion was based on calculations made on 18 globular proteins (44 values in total), for which the molar extinction coefficients were known. There were generally good agreement between the measured and the calculated values (as stated in the abstract: "(to ± 5% in most cases)") but 6 of the values deviated more than 10%.
To improve this accuracy, Pace et al. used 116 measured molar extinction coefficients from 80 proteins to calibrate the equation for the native protein in water. They ended with the recommended values that has an overall ave%dev=3.836% between the calculated and the measured coefficients, but for the 94 entries containing Trp the ave%dev=3.167%. Generally, the calculated values deviated much more (often >10%) from the measured for proteins that do not contain Trp residues. This is due to the fact that Trp contributes much more to the overall extinction coefficient than does Tyr and cystines, and that the Trp extinction coefficient is less sensitive to the environment than the one for Tyr.
Note from Gill, S.C. and von Hippel P.H. discussion: We assume that the protein contains no [other] chromophores that absorb at 280 nm. This means the concentration of conjugated proteins (e.g., catalase, hemoglobin, or peroxidase) that contain prosthetic groups absorbing in the near UV and visible portions of the spectrum cannot be analyzed by this approach.
In vivo half-life
The half-life is a prediction of the time it takes for half of the amount of protein in a cell to disappear after its synthesis in the cell. ProtParam relies on the "N-end rule", which relates the half-life of a protein to the identity of its N-terminal residue; the prediction is given for 3 model organisms (human, yeast and E.coli). The N-end rule (for a review see [ 5],[ 6]) originated from the observations that the identity of the N-terminal residue of a protein plays an important role in determining its stability in vivo ([ 2],[ 3],[ 4]). The rule was established from experiments that explored the metabolic fate of artificial beta-galactosidase proteins with different N-terminal amino acids engineered by site-directed mutagenesis. The beta-gal proteins thus designed have strikingly different half-lives in vivo, from more than 100 hours to less than 2 minutes, depending on the nature of the amino acid at the amino terminus and on the experimental model (yeast in vivo; mammalian reticulocytes in vitro, Escherichia coli in vivo). In addition, it has been shown that in eukaryotes, the association of a destabilizing N-terminal residue and of an internal lysine targets the protein to ubiquitin-mediated proteolytic degradation [ 6]. Note that the program gives an estimation of the protein half-life and is not applicable for N-terminally modified proteins.
Table of the amino acids and the corresponding half-life
Amino acid Mammalian Yeast E. coli Ala 4.4 hour >20 hour >10 hour Arg 1 hour 2 min 2 min Asn 1.4 hour 3 min >10 hour Asp 1.1 hour 3 min >10 hour Cys 1.2 hour >20 hour >10 hour Gln 0.8 hour 10 min >10 hour Glu 1 hour 30 min >10 hour Gly 30 hour >20 hour >10 hour His 3.5 hour 10 min >10 hour Ile 20 hour 30 min >10 hour Leu 5.5 hour 3 min 2 min Lys 1.3 hour 3 min 2 min Met 30 hour >20 hour >10 hour Phe 1.1 hour 3 min 2 min Pro >20 hour >20 hour ? Ser 1.9 hour >20 hour >10 hour Thr 7.2 hour >20 hour >10 hour Trp 2.8 hour 3 min 2 min Tyr 2.8 hour 10 min 2 min Val 100 hour >20 hour >10 hour
Instability index (II)
The instability index provides an estimate of the stability of your protein. Statistical analysis of 12 unstable and 32 stable proteins has revealed [ 7] that there are certain dipeptides, the occurence of which is significantly different in the unstable proteins compared with those in the stable ones. The authors of this method have assigned a weight value of instability to each of the 400 different dipeptides (DIWV). Using these weight values it is possible to compute an instability index (II) which is defined as:
i=L-1 II = (10/L) * Sum DIWV(x[i]x[i+1]) i=1where:
L
is the length of sequence and DIWV(x[i]x[i+1])
is the instability weight value for the dipeptide starting
in position i.
A protein whose instability index is smaller than 40 is predicted as stable, a value above 40 predicts that the protein may be unstable.
Aliphatic index
The aliphatic index of a protein is defined as the relative volume occupied by aliphatic side chains (alanine, valine, isoleucine, and leucine). It may be regarded as a positive factor for the increase of thermostability of globular proteins. The aliphatic index of a protein is calculated according to the following formula [ 8]:
Aliphatic index = X(Ala) + a * X(Val) + b * ( X(Ile) + X(Leu) )where
X(Ala)
, X(Val)
, X(Ile)
, and X(Leu)
are mole percent (100 X mole fraction) of
alanine, valine, isoleucine, and leucine. The coefficients a
and b
are the relative volume of valine side chain (a =
2.9) and of Leu/Ile side chains (b = 3.9) to the side chain of alanine.
GRAVY (Grand Average of Hydropathy)
The GRAVY value for a peptide or protein is calculated as the sum of hydropathy values [ 9] of all the amino acids, divided by the number of residues in the sequence.
[1a] Pace, C.N., Vajdos, F., Fee, L., Grimsley, G., and Gray, T. (1995) How to measure and predict the molar absorption coefficient of a protein. Protein Sci. 11, 2411-2423. [PubMed: 8563639]
[1b] Edelhoch, H. (1967) Spectroscopic determination of tryptophan and tyrosine in proteins. Biochemistry 6, 1948-1954. [PubMed: 6049437]
[1c] Gill, S.C. and von Hippel, P.H. (1989) Calculation of protein extinction coefficients from amino acid sequence data. Anal. Biochem. 182:319-326(1989). [PubMed: 2610349]
[2] Bachmair, A., Finley, D. and Varshavsky, A. (1986) In vivo half-life of a protein is a function of its amino-terminal residue. Science 234, 179-186. [PubMed: 3018930]
[3] Gonda, D.K., Bachmair, A., Wunning, I., Tobias, J.W., Lane, W.S. and Varshavsky, A. J. (1989) Universality and structure of the N-end rule. J. Biol. Chem. 264, 16700-16712. [PubMed: 2506181]
[4] Tobias, J.W., Shrader, T.E., Rocap, G. and Varshavsky, A. (1991) The N-end rule in bacteria. Science 254, 1374-1377. [PubMed: 1962196]
[5] Ciechanover, A. and Schwartz, A.L. (1989) How are substrates recognized by the ubiquitin-mediated proteolytic system? Trends Biochem. Sci. 14, 483-488. [PubMed: 2696178]
[6] Varshavsky, A. (1997) The N-end rule pathway of protein degradation. Genes Cells 2, 13-28. [PubMed: 9112437]
[7] Guruprasad, K., Reddy, B.V.B. and Pandit, M.W. (1990) Correlation between stability of a protein and its dipeptide composition: a novel approach for predicting in vivo stability of a protein from its primary sequence. Protein Eng. 4,155-161. [PubMed: 2075190]
[8] Ikai, A.J. (1980) Thermostability and aliphatic index of globular proteins. J. Biochem. 88, 1895-1898. [PubMed: 7462208]
[9] Kyte, J. and Doolittle, R.F. (1982) A simple method for displaying the hydropathic character of a protein. J. Mol. Biol. 157, 105-132. [PubMed: 7108955]