N-glycosylation sites (along with disulfides) predict the cellular location of
proteins to a certain extent. Proteins are glycosylated on an internal asperagine
residue as they pass through the endoplasmic reticulum for export.
Glycosylation affects protein folding, trafficking, solubility,
antigenicity, half-life, localization, and cell-to-cell interactions.
<P>
Potential N-glycosylation sites are reliably predicted by a tripeptide
motif, NxT or NxS, where x is not proline and P seldom follows. Multiple
motifs in mature regions of the same protein and conservation in homologs
provide significant additional support. Glycoproteins are targeted to the
cell exterior, lysosome or similar compartment, and  extra-cellular space,
but not to the cytoplasm, nucleus, or mitochondria. Predictions can
sometimes be reinforced by the presence of a signal peptide or a
glycosylphosphatidylinositol (GPI) membrane terminal anchor.
<P>
Actual occupancy of N-glycosylation sites <em>in vivo</em> is hard to predict.
Specific sites may be substituted to varying degrees of saturation in
various tissues at different developmental stages, and complex carbohydrate
moieties are not always fully built-out. The three amino acids involved are
very common; therefore, many chance occurrences happen, which can be
estimated as occurrence of the reversed motif. Thus, the number of
glycosylation sites used genome-wide can be estimated as the excess,
(NxT + NxS) - (TxN + SxN), x not P. However, this does not address
glycosylation of specific motifs. Some glycosylation sites are quite
ancient, but others have been gained or lost over moderate time scales in
paralog familes.
<P>
Conservation of a potential motif in mammals provides only weak support for
its utilization. Because overall protein conservation is 85% between -- for
example -- mouse and human, the 2-amino-acid motif would be invariant 72% of 
the time without its necessarily being conserved due to selection. Even observed 
conservation between pufferfish and human is not
persuasive (50% chance). The motif might be conserved for catalytic or
structural reasons, not glycosylation. However, a persuasive case can
sometimes be made from integration of all available information on a given
protein.
