Identification and Analysis of Protein Phosphorylation by Mass Spectrometry
Dean E. McNulty, Timothy W. Sikorski and Roland S. Annan
Proteomics and Biological Mass Spectrometry Laboratory, GlaxoSmithKline, Collegeville, PA, USA
Introduction to Protein Phosphorylation
Much of the activity in the cellular proteome is under the control of reversible protein phosphorylation. Phosphorylation-dependent signaling regulates differentiation of cells, triggers progression of the cell cycle, and controls metabolism, transcription, apoptosis, and cytoskeletal rearrangements. Signaling via reversible protein phosphorylation also plays a critical role in intracellular communication and immune response. Phosphorylation can function as a positive or negative switch, activating or inactivating enzymes. It can serve as a docking site to recruit other proteins into multiprotein complexes or serve as a recognition element to recruit other enzymes that add other post-translational modifications (PTMs) or additional phosphorylation sites. Phosphorylation can trigger a change in the three-dimensional structure of a protein or initiate translocation of the protein to another compartment of the cell. Disruption of normal cellular phosphorylation events is responsible for a large number of human diseases [1-3]. From the discovery of the first functionally relevant phosphorylation site in 1955 , the ability to analyze protein phosphorylation has exploded in the last five years to the point where it is now possible to quantitate changes in tens of thousands of phosphorylation sites in response to a cell receiving an external stimulus or undergoing a normal change in the physiology . While phosphorylation is known to occur on histidine, aspartate, cysteine, lysine, and arginine residues, this chapter focuses on the more commonly modified and well-studied amino acids: serine, threonine, and tyrosine.
The first evidence for protein phosphorylation was uncovered in 1906 when Phoebus Levene identified phosphate in the amino acid composition of the egg yolk protein vitellin . While there was evidence in the 1920s to suggest the
Analysis of Protein Post-Translational Modifications by Mass Spectrometry,
First Edition. Edited by John R. Griffiths and Richard D. Unwin.
© 2017 John Wiley & Sons, Inc. Published 2017 by John Wiley & Sons, Inc.
phosphate was on the amino acid serine , it was not until 1932 that Levene and Fritz Lipmann isolated phosphoserine from vitellin . Prior to the 1950s, research on phosphoproteins was focused mainly on abundant proteins found in egg yolk (such as vitellin) and milk (casein), and the biological function, if any, of the phosphorylation was unknown. But by the early 1950s, it was being shown that in tumor cells the phosphorus in phosphoproteins was being turned over rapidly and that tumors contained high levels of phosphoserine [9, 10], together suggesting that this modification must have some function. In 1954 Kennedy and Burnett, using labeled ATP, demonstrated that an enzyme from rat liver mitochondria was responsible for catalyzing the phosphorylation of serine on both alpha and beta casein . A year later Fischer and Krebs provided the first evidence that protein phosphorylation had a biological function. They demonstrated that inactive phosphorylase b could be converted to active phosphorylase a in the presence of ATP and Mg , and in the next few years they identified phosphorylase kinase as the enzyme responsible for the activation and showed that it phosphorylated a specific serine residue on phosphorylase b .
It is now widely recognized that cascades of protein phosphorylation transmit signals from the extracellular environment to trigger a biological response within the cell. The first evidence that kinases worked in series came in 1968 with the discovery of cAMP-dependent protein kinase A (PKA) and the fact that it phosphorylated and activated phosphorylase kinase . It quickly became clear that PKA had many substrates in multiple tissues , and the idea that protein phosphorylation was a widespread phenomenon began to take hold. Throughout the 1970s and 1980s many additional serine/threonine (S/T) protein kinases were discovered, and in 1983 Tony Hunter showed that the v-Src protein was a tyrosine kinase (TK) . The difficulty in detecting phosphotyrosine in these early years arises from the fact that we now know it constitutes only a few percent of the total phosphoamino acid pool [5, 16] and that it comigrated with the much more abundant phosphothreonine in the standard electrophoretic systems used in the late 1970s to detect 32P-labeled phosphoamino acids .
With the development by Hunter and Sefton of a two-dimensional (2D) separation method for phosphoamino acids , it quickly became clear that phosphorylation on tyrosine was also widespread. In 1981 the EGF receptor (EGFR) was shown to have TK activity and that stimulation of cells with EFG led to rapid tyrosine phosphorylation on multiple proteins [18, 19]. By the end of the 1980s more than 10 receptor tyrosine kinases (RTKs) had been identified. The realization that growth factor receptors had intrinsic TK activity connected intracellular signaling through (largely) serine/threonine (S/T) kinases with external signals communicated via ligand binding to transmembrane receptors. In many cases, nonreceptor tyrosine kinases (NRTK) constitute the next step in the signaling cascade, transmitting signals from the intracellular domains of the RTK to downstream S/T protein kinases [20, 21]. Vast amounts of research in the 1980s and 1990s encompassing all areas of cellular biology would discover many more kinases and their substrates and add much fine detail to the mechanism of phosphorylation-dependent signaling.
The identification of all human kinase genes was made possible with the complete sequencing of the human genome . Bioinformatic analysis has identified 478 protein kinases (see Figure 2.1, right), belonging to a large superfamily that shares a eukaryotic protein kinase (ePK) domain. There are an additional 40 atypical protein kinases (aPK), which have been demonstrated to have protein kinase activity, but do not share the ePK domain. Altogether the 518 protein kinases make up one of the largest families of eukaryotic genes (see Figure 2.1). All major kinase groups and most kinase families are shared across metazoans, and many are shared in yeast . Protein tyrosine kinases (PTK) of which 90 have been identified are found only in metazoans . More than half of these (58) are RTKs, involved in regulating the multicellular aspects of an organism via cell-to-cell communication. It is surprising how little is actually known about most of these 518 protein kinases (termed the “kinome”).
Figure 2.1 Protein phosphorylation is governed by two large superfamilies of enzymes. Protein kinases (right) add phosphate to (primarily) serine, threonine, and tyrosine residues. Protein phosphatases (left) remove the phosphate group. There are similar numbers of tyrosine kinases and phosphatases. The very small number of serine/threonine phosphatases achieves selectivity by forming combinatorial enzyme complexes with a large number of regulatory subunits.
More than 100 of the kinases have absolutely no known function, and 50% are largely uncharacterized . A very small percentage of the kinome accounts for most of the published literature. This lack of knowledge about most of the human kinome is reflected in the fact that, of the twenty approved kinase therapeutics, they address only nine different kinases as their primary targets . This is in spite of the fact that kinases are characterized as excellent drug targets in cancer and many other diseases. Kinase gene profiling shows distinct expression pattern differences between healthy and disease tissues for large clusters of the kinome .
Given the wide range of processes that are under the control of reversible protein phosphorylation and the large number of protein kinases in the metazoan genomes, it is not surprising that the extent of phosphorylation in higher-order organisms is massive. Current phosphosite databases [28, 29] list more than 150,000 sites on over 18,000 human proteins, many more than were previously predicted. The large majority of these sites have been identified in high- throughput phosphoproteomics studies utilizing MS. Large-scale phosphopro- teome studies suggest that the overall phosphoamino acid composition of any cell is approximately 75-85% phospho serine, 10-20% phosphothreonine, and 1-6% phosphotyrosine [5, 30-33]. This composition likely reflects the biology of the cell and not some bias of the mass spectrometer, as it has been shown using a large-scale synthetic phosphopeptide library that peptides containing all three types of phosphoamino acids are detected equally .
In 15-25% of phosphoproteins only a single site has been identified. The functional significance of these single sites is to act, in many cases, as a simple switch. Glycogen phosphorylase, for instance, contains only a single phos- phoserine that drives it from the inactive to the active state . The majority of proteins, however, are phosphorylated on more than one site and by more than one kinase. The spliceosome protein Srrm2 was found to contain anywhere between 177 and 300 sites [30, 33]. As might be expected, a weak but significant correlation exists between a protein's abundance and the number of sites identified in an analysis . However, it is clear that multisite phosphorylation is the rule rather than the exception. It has been suggested that the multiplicity of phosphorylation on proteins might just be background noise. However, it is equally likely that given the wide variety of biological functions under the control of protein phosphorylation and the wide variety of mechanisms by which it occurs, the functional significance of most of the complex hyperphosphorylation that occurs on proteins is not yet understood. What is emerging, however, is just how intricately this multisite phosphorylation is coordinated. While some phosphorylation clusters share a common biological function, in many cases each site or a combination of sites has distinct and separable roles in that function.
The budding yeast transcription factor Pho4 controls the expression of genes needed by the organism to survive under conditions of phosphate starvation.
In a normal phosphate-rich environment, PHO4 is phosphorylated on 5 cyclin/ Cdk sites and exported out of the nucleus. When yeasts are deprived of phosphate, these sites are unoccupied, and Pho4 accumulates in the nucleus and activates expression of phosphate-responsive genes. Four of the five cyclin/ Cdk sites have distinct roles to play in the regulation of this function, with two being required for nuclear export, one for blocking nuclear import, and one for blocking promoter binding [35, 36]. To add complexity to this mechanism, under intermediate conditions of phosphate availability, PHO4 is phosphoryl- ated on only one of the sites, allowing it to bind differentially to its target promoters and trigger expression of only a subset of the phosphate-responsive genes .
In contrast to PHO4, whose function is regulated by multisite phosphorylation via a single kinase, Sic1 is regulated by a multisite phosphorylation cascade that involves a complex dance of two different kinases. Sic1 controls the G1/S phase transition in budding yeast by inhibiting the S-phase Clb5-Cdk1 kinase. Ubiquitin-mediated destruction of Sic1 releases Clb5-Cdk1 and allows the cell to proceed to S phase (Figure 2.2a). In one of the first examples of how phosphorylation regulates ubiquitin-mediated proteolysis, Sic1 was shown to be phosphorylated on at least nine different sites and required a combination of at least three of six to trigger degradation . In fact it was later shown that some phosphorylation on at least six of the nine Cdk sites is required for destruction . Five of the nine Cdk-dependent sites form three pairs of high- affinity recognition elements termed phosphodegrons (see Figure 2.2b), which are recognized by ubiquitin ligases . These nine sites are phosphorylated by two different cyclin/Cdks, with each showing preference for different sites. At the transition to S phase, Cln2-Cdk1 phosphorylates Sic1 on a subset of the nine sites, but with no fully formed degrons (Figure 2.2b, top). This cluster of phosphorylation sites, however, is an excellent docking platform for the slowly released Clb5-Cdk1 (Figure 2.2b, bottom), which goes on to complete phosphorylation of the residues critical for the formation of the degrons . The ordered phosphorylation by two different kinases imposes a tight regulation on the G1/S transition in which Cln2-Cdk1 is not allowed to trigger the change until sufficient levels of Clb5-Cdk1 accumulate.
For both Pho4 and Sic1, phosphorylation drives the protein's biological function by regulating protein-protein interactions. In the case of Pho4, it blocks the interaction of Pho4 with nuclear import and export transport proteins and the transcriptional coactivator protein that allows promoter binding. In the case of Sic1, phosphorylation of the priming sites facilitates binding of cyclin/Cdk complexes through their regulatory subunit Cksl. Phosphorylated sites within the three degrons of Sic1 then serve as docking sites for the SCF ubiquitin ligase. Indeed while the earliest examples of the biological significance of protein phosphorylation were in the conformation-induced stimulation of enzymatic activity, it has since become clear that much of protein
Figure 2.2 Cascades of multisite phosphorylation regulate biological function. (a) Sic1 controls the G1/S phase transition in budding yeast by inhibiting the S-phase Clb5-Cdk1 kinase. Phosphorylation-dependent ubiquitin-mediated destruction of Sic1 releases Clb5-Cdk1 and allows the cell to proceed to S phase. (b) In the first wave of phosphorylation (top), a subset of required sites are sequentially modified, but no fully formed binding sites ( О ) for the ubiquitination machinery are formed. These initial sites act as priming sites for the second wave of phosphorylation (bottom), which is being carried out by the slowly released Clb5-Cdk1. The now fully formed phosphodegrons bind the ubiquitination machinery, initiating destruction of Sic1. Without further sequestration of Clb5-Cdk, the cells can transition into S phase.
phosphorylation serves to either recruit or block the recruitment of other proteins. The first example of this came with the discovery of SH2 domains. The search for TK substrates in the early 1980s revealed that growth factor receptor TKs preferred themselves as substrates. This raised the question “How do RTK transmit signals to drive cellular behavior?” In 1986 Tony Pawson identified a region in the oncogenic NRTK v-Fes that was conserved in all cytoplasmic tyrosine kinases and influenced their kinase activity . Termed Src homology domain 2 (SH2), it was later shown that SH2 domain-containing proteins bind other proteins, including growth factor receptors, that are phos- phorylated on tyrosine [43, 44]. The recruitment of SH2 domain-containing proteins to phosphotyrosine-containing residues on growth factor receptors thus provides a mechanism by which RTKs can cascade signals into the cytoplasm. There are 120 SH2 domains on 115 proteins in the human genome. They occur on proteins that link tyrosine phosphorylation to intracellular signaling, including all NRTKs, some tyrosine phosphatases, some lipid kinases, and many adaptor proteins . While the SH2 domain remains the prototype for phosphorylation-mediated protein-protein interactions, other phosphosite-dependent binding domains have since been discovered, including the PTB domain that also binds phosphotyrosine . More than ten phos- phoserine and phosphothreonine binding domains have also been discovered  including WD40 domains, which are part of the F-box proteins that act as the substrate recognition element of SCF E3 ubiquitin ligases including the one that mediated the destruction of Sic1 as described earlier.
Along with the reality that multisite phosphorylation is the norm for eukaryotic proteins, it has also now become clear that most of this phosphorylation occurs in intrinsically disordered regions of proteins . Nearly all eukaryotic proteins contain disordered regions, and some proteins are predicted to be entirely disordered . Intrinsically disordered proteins (IDP) play a central role in mediating protein-protein interactions and the assembly of complex protein interaction networks . The disordered regions contain multiple conserved sequence motifs that serve as docking sites for other proteins, including protein kinases. The flexibility of the disordered regions makes them accessible to PTM, including but not limited to phosphorylation. With the addition of these PTMs, it is estimated that perhaps a million sequence-specific interaction motifs exist with the disordered regions of the proteome . In addition to Sic1, two other well-studied examples of phosphorylation (and other PTM) clusters in disordered regions that control function are p53  and RNA polymerase II . The latter protein contains 52 YSPTSPS repeats in the disordered C-terminal tail that are phosphorylated on the second and fifth serines in the motif, recruiting splicing factors, chromatin modifiers, termination machinery, and other protein modules to the elongation machinery. Interestingly, the phosphorylation of intrinsically disordered regions often brings about a disordered to ordered transition in the protein structure that can either facilitate or inhibit protein-protein interactions [54, 55].
The massive amount of phosphorylation on proteins in the cell depends not only on the activity of protein kinases but also on the opposing activity of protein phosphatases. Just as there are specific kinases, which phosphoryl- ate proteins on specific sites, there are specific phosphatases that remove phosphate from those sites. Phosphatases can be broadly divided according to their substrate specificity to include protein serine/threonine phosphatase (PSP), protein tyrosine phosphatase (PTP), and dual-specificity protein phosphatase families (Figure 2.1, left). Based on structure, rather than function, phosphatases group into several completely separate families than share no structural similarities . The human genome codes for about 40 classical PTPs [57, 58], approximately half the number of TKs. Surprisingly there are only approximately 30 different PSPs, based on distinct catalytic subunits, to balance the 388 S/T kinases. The specificity of PSPs comes from their interaction with a large number of regulatory subunits , which together account for many more S/T phosphatases than there are S/T kinases .
For a protein to be regulated by phosphorylation, the activities of specific kinases and phosphatases must be in balance. A tip in the balance triggers the regulation. The intrinsic activity of S/T kinases and phosphatases are approximately equivalent, as are their intracellular concentrations . In contrast to PSPs, there is compelling evidence that the activity of PTPs is several orders of magnitude higher than that of PTKs . This much higher activity accounts for the overall low stoichiometry and more transient nature of tyrosine phosphorylation. The half-life of tyrosine phosphorylation on EGFR was found to be on the order of 15 s . Clearly the maintenance of the balance between kinase and phosphatase goes beyond simple intrinsic enzyme activity and concentration. An equally complex system of checks and balances must regulate the regulators.
The complexity of protein phosphorylation in the cell is enormous, and the range of biological functions under the control of protein phosphorylation covers every aspect of cellular life. How all of this is regulated is also of enormous complexity. More than a 1000 protein kinase and phosphatase complexes attach and remove phosphate from the side chains of serine, threonine, and tyrosine. In all cases where the mechanisms are well studied, the addition and removal of phosphorylation are ordered and controlled. Yet most of the phosphorylation identified to date has no biological function assigned to it or how it is regulated. This understanding will require the analysis of changes in global phosphorylation patterns as well as an in-depth analysis of individual proteins. And this analysis will need to be quantitative. It is no longer sufficient to just catalog phosphorylation sites. MS has played a central role in contributing to what we have learned about protein phosphorylation, particularly in the last 10 years. It will be an indispensible tool for us going forward as we seek to understand the dynamic nature of protein phosphorylation and how it controls a cell's function. This chapter covers both the basic elements of analyzing phosphorylation on isolated proteins and the use of phosphoproteomics strategies to understand the global phosphorylation-dependent response of a cell to changes in its physiology or environment. Both of these two approaches are covered more or less from the perspective of a discovery mode. The individual or multiplexed analysis of targeted phosphorylation events is covered elsewhere  (also see Chapter 1).