Bioinformatics tools for protein glycation are, as yet, poorly developed. Mature developed bioinformatics tools will ideally predict sites of protein glycation within the proteome and in combination with experimental studies will assist in surety of location of sites of glycation adducts within proteins. Glycation is a nonenzymatic process, and so selectivity for sites of glycation is determined by the reactivity of the lysine and arginine of N-terminal residue under consideration. This is linked to (i) microscopic pKa of the residue being modified, (ii) surface exposure of the modification site, and (iii) a proximate conjugate base catalyzing the dehydration step involved in dehydration steps for FL and MG-H1 residue formation (Figure 8.3).
Microscopic pKa values of lysine, N-terminal, and arginine residues are an important determinant of glycation sites because the rate of glycation is determined by the basicity of the side chain residue and essentially the reaction proceeds through the usually minor proportion of the amino moieties of lysine side chain and N-terminal residues and guanidino side chain moieties of arginine. This has a profound influence on the site of glycation by glucose on N-terminal and lysine residues and on glycation by MG of arginine residues. Microscopic pKa values may be computed for proteins of known crystal structure - for example, by using the H++ automated system (http://biophysics. cs.vt.edu/H++) . There is marked diversity of pKa values of Lys and Arg
Figure 8.3 Activation of arginine residues in a-helix domains of proteins by neighboring group interactions with basic and acidic amino acid residues. Source: Rabbani, 2010. . Reproduced with permission from Springer.
residues in proteins. For example, in HSA, microscopic pKa values of the 59 lysine residues vary from 7.9 to 14.0 and of the 24 arginine residues from
12.2 to 18.6, an expected reactivity range of >106 (cf. reactivity of N-terminal serine pKa of 7.9) . The major sites of glycation by glucose in HSA are, in order of reactivity, N-terminal D1, K525, K199, and K439 . This compares to rank order by increasing pKa value of lysine side chain and N-terminal amino groups of first-equal, sixteenth, third and fourteenth. Low pKa values are likely driving glycation of D1 and K199. Activating features of K525 and K439 may be deprotonation catalyzed by proximate E520/R521 and E442, respectively. In a study of the hotspot sites of glycation of HSA by MG, three of the five sites with MG-H1 residue formation had the lowest microscopic predicted pKa values: R218, pKa = 12.2; and R186 and R410, pKa = 12.5. However, the remaining two sites - R114 and R428 with predicted pKa values of 13.6 and 15.1 - ranked 8th and 14th of 24 in order of increasing microscopic predicted pKa value. R114 has high surface exposure, which likely also facilitates MG modification. All activated arginine residues have a positively charged R or K residue 3 or 4 residues, further along in the sequence decreasing microscopic pKa value, and R428 only has a negatively charged residue, E425, preceding in the sequence. A subsequent study confirmed these hotspot sites except for R114 and suggested R257 as a further hotspot modification site, which has a relatively low pKa (=12.9) . The proximity of a negatively charged, D or E, residue provides a conjugate base to promote the rate-limiting removal of a protein from the protein-glucose Schiff's base and arginyl-dihydroxyimidazolidine precursors of fructosamine and MG-H1 adducts. The combination of proximate cationic and anion side chain residues for lysine and arginine residue activation was initially proposed to explain site specificity of lysine residue glycation by glucose  and then applied to MG-H1 formation from arginine .
The aforementioned considerations are features relating to the rate of formation of glycation adducts. FL and MG-H1 residues have half-lives of ca. 25 and 12 days, respectively [19, 106], which exceed the half-lives of most human proteins (median half-life of 1.9 days ). Therefore, for many proteins, the steady-state extent of protein glycation is also influenced by the half-life of the protein. Hence, early studies found that the extent of glycation by glucose of several proteins in vivo was linked to the protein half-life . Since glycation leads to protein distortion and misfolding, it is also expected that glycated proteins are targeted for cellular proteolysis and have an unusually decreased halflife. This remains to be determined in robust unfocused proteome dynamics studies. The level of FL and N-terminal fructosamine residues in cellular proteins is also influenced by enzymatic removal and repair by F3PK . F3PK has different specific activity for FL residues in different sites in proteins. The FL residues detected at different sites in proteins are, therefore, a balance of the intrinsic reactivity for glycation and the reactivity of the FL residue site for repair by FP3K - see glycated hemoglobin, for example . There is no known enzymatic mechanism for repair of MG-H1 residues.
An examination of protein motifs for glucose glycation forming FL was made empirically by compiling and combining peptide motifs from published peptide mapping studies. It was found that K and R residues dominate in the N-terminal region and D and E residues dominate in the C-terminal region of FL sites, but no clear motif for FL formation was found . A problem with this approach is inclusion of data where studies have produced highly glycated proteins in vitro with markedly greater than 2-3 molar equivalents of modification (which has been common in glycation research [85, 112]). Under these conditions, glycation occurs at many sites - both those favored and those less favored - since the glycation process is so highly driven by high glycated agent concentration. Information on site-specific and selective modification is thereby obscured. In a study of human plasma and red blood cells, detection and filtering for unique peptides with >5 spectrum counts gave 361 and 443 unique glycated peptide sequences from native human plasma and red blood cells, respectively. There was only limited evidence to support the hypothesis of N-terminal enrichment of K and R residues and C-terminal enrichment of D and E residues in the sequence motif for hotspot glycation by glucose. Overall, hydrophobic short-chain or uncharged side chain amino acids A, V, L, and S occurred most frequently close to the sites of glycation. E also was highly represented at some amino acid positions close to the glycation site. H, Y, M, W, C, and F were the least frequent amino acid residues close to the sites of glycation .
A further consideration for a nonenzymatic PTM such as glycation is to ask, when is the modification of functional importance to the protein substrate? We addressed this question by considering that glycation will have functional effect when it occurs in a domain where the protein interacts with other
Figure 8.4 Receptor-binding domain (RBD) analysis. RBD plot of human serum albumin prepared by the method of Gallet et al.  showing sites of hotspot modification by MG. The RBD area is the top truncated trapezium area in the top left of the chart. It was computed with a sequence amino acid interval of 5. Key: gray circles, amino acid residues in human serum albumin; R410 hotspot of MG modification; and R114, R186, R218, and R428, also modified by MG. Other arginine residues within the RBD but are not modified by MG. This figure was originally published in color in Source: Rabbani 2014 , Reproduced with permission from Portland Press.
proteins, DNA, or substrates. A sequence-based predictor of such functional domains has been developed - the receptor-binding domain (RBD) analysis . By such analysis, we were able to show that MG modifications of albumin occur at functional sites (Figure 8.4). We have applied the RBD analysis proteome-wide, and this may now be linked to glycation proteomics to identify glycated proteins that are expected to be dysfunctional.