<!------------Discussion----------->
Proteins are charged due to amino acid side chains, secondary modifications, and unblocked 
terminal residues. The net charge swings from negative to positive according to 
the pH of the ambient water and the local environment of each contributing 
residue in the folded protein. The pI is the pH that results in a net charge
of zero. 
<P>
The pI is a useful property because it predicts solubility properties and
so suggests the cellular environment in which a protein will be found. 
By displaying an individual protein's pI over the histogram of genome-wide
pI distribution, the Protein Browser calls attention to unusual values of
pI that could be clues to function or location.
For example, human pepsin A has a pI of 3.4, consistent with its expression in the
highly acidic stomach; human cytochrome c has a pI of 9.6  appropriate to binding
negatively-charged membrane phospholipid.  At the isoelectric-focusing
stage of 2-D electrophoresis, each protein migrates to its isoelectric
point. A protein migrating anomalously may have an unsuspected covalent
modification.
<P>
The pI algorithm uses pK values for relevant amino acids (C, D, E, H, K, R, 
S, T, Y, and amino- and carboxy- termini) derived experimentally for model peptides in 
9 M urea at 25 degrees C. The prediction tool 
removes predicted signal peptides, but does not consider other modifications. It 
is least accurate on small proteins with limited intrinsic buffering capacity, 
with highly basic proteins where pI lacks a solid experimental foundation, and 
on protein domains buried in membrane. Raw sequence is used when no UniProtKB 
accession is available. 
<P>
Clearly, it is not feasible to precisely predict pI for native proteins
because this requires knowledge of how the local folded environment has affected 
ionization propensities (pK) of individual contributing residues. A notable
example is  histidine, the residue most sensitive to slight variations around neutral pH. 
Furthermore, biologically relevant pI is a property of mature proteins but not 
all processing steps can be reliably predicted.  Modification may cycle on and 
off <em>in vivo</em> according to regulatory status -- for example, many 
proteins are transiently phosphorylated. 
<P>
Genome-wide experimental determination of the pI of native proteins cannot be 
accomplished on a microarray because each protein would have to be correctly 
processed, folded, and modified. Some relevant modifications, such as sialylated 
glycosylation, are not yet achievable <em>in vitro</em> and can vary by cell type. 
Nonetheless, the predicted pI values are accurate enough for many purposes. 
<P>
It is likely that intrinsic membrane proteins are under-represented in the data set
because they are more difficult to purify than peripheral or cytosolic
proteins.  The mammalian pI distribution seems adequately modeled by an 
acidic Gaussian centered on pH 5.8, a pronounced trough at pH 7.4, and an 
alkaline Gaussian centered on pH 8.8. The near-neutral pI region has presumably 
been depleted by selective pressure -- cytoplasmic proteins are least soluble at 
their isoelectric point. Proteins may be evolutionarily trapped on one side of
the valley or the other. 
<P>
<CENTER>
<IMG height=264 width=276 src="pi.jpg">
</CENTER>
