Mass Spectrometry-Based Proteomics
It is becoming increasingly clear that citrullination is a widespread protein post-translational modification (PTM) that is essential in both health and disease. To better understand the role citrullination plays, there is a need to identify not only proteins that are citrullinated but also the specific sites of citrullination. The technique best suited for this purpose is mass spectrometry-based proteomics. Proteomics is the large-scale and comprehensive study of a complement of proteins . In recent years mass spectrometry-based proteomics techniques have been developed to identify PTMs [56-58], protein-protein interactions , and the complete proteome of whole systems [60, 61]. The term “citrullinome” refers to the large-scale proteomics analysis of citrullinated peptides  and was first used by Tutturen et al.
Large-scale proteomics is usually performed on peptide mixtures, termed bottom-up proteomics ; a complex protein mixture (e.g., whole cell lysate , tissue sample , biological fluid , etc.) is digested with a protease that cleaves the proteins at specific residues. In most cases the resulting peptide digest is analyzed using C18 reversed-phase (RP) liquid chromatography (LC) coupled with tandem mass spectrometry (LC-MS/MS), though additional prefractionation steps are often included (strong cation-exchange (SCX) chromatography  or sodium dodecyl sulfate polyacrylamide gel electrophoresis
Figure 7.3 A schematic workflow of a basic proteomics experiment. A protein sample is treated with a protease to produce peptides. These peptides are then separated by liquid chromatography and analyzed by tandem mass spectrometry followed by database search identification.
(SDS-PAGE) ). Figure 7.3 shows a schematic workflow for a proteomics analysis. During RP LC separation, peptides bind to the C18 chains and are eluted from the column and into the mass spectrometer as the organic content of the mobile phase is increased, that is, peptides that are more hydrophobic elute later than less hydrophobic peptides. The majority of proteomics experiments utilize electrospray ionization (ESI)  as the method to transfer peptides from solution to gas phase and create ions. Once in the gas phase, the m/z of the peptide ions is measured. Peptide sequence information is obtained by fragmenting the precursor peptide ion generating product ions (MS/MS).
There are two main fragmentation methods used in proteomics: slow heating and electron-mediated methods. In both cases, the precursor ion of interest is first isolated from all other ions. The fragmentation method predominantly used in proteomics is collision-induced dissociation (CID) , which is a slow heating approach. Peptide ions are accelerated and collided with a neutral gas (usually nitrogen, argon, or helium). The collisions result in the conversion of some of the kinetic energy to internal energy. When sufficient energy is internalized, the most labile bond will break. In peptides, this bond is predominantly the N-Co backbone bond, although there is preferential cleavage of labile PTMs such as serine phosphorylation . The backbone fragment ions are known as b- and y-type ions [72, 73] (b ions contain the N-terminus, and y ions contain the C-terminus; Figure 7.4) and are numbered from the terminus they contain.
Figure 7.4 A schematic of peptide fragmentation by collision-induced dissociation and electron capture/transfer dissociation. R groups represent amino acid functional groups.
CID is a rapid and efficient process and as such the majority of proteomics experiments make use of this approach. The instrumentation used to generate CID fragments affects both the time for fragmentation and the resulting mass spectrum. Linear ion-trap CID takes between 1 and 100 ms (slow CID), whereas beam-type quadrupole CID (sometimes termed higher-energy collisional dissociation (HCD) or fast CID) takes between 0.5 and 1 ms [74, 75]. Linear ion traps are unable to trap fragment ions with m/z values below approximately 28% of the precursor ion m/z. This “one-third” low mass cutoff can cause problems when trying to identify low mass product ions from glycosylation  or mass tags for quantitation . Beam-type CID is unaffected by the one-third rule and can be used to detect low mass product ions [76, 78] and mass tags . One potential drawback of fast CID is that fragmentation often results in additional ions including a- and x-type (fragmentation of the C-CO bond) and immonium ions, which can interfere with sequence assignment.
There are two main electron-mediated fragmentation methods used in pro- teomics: electron capture dissociation (ECD)  and electron transfer dissociation (ETD) . We will only discuss ETD as ECD requires a highly specialized FT-ICR mass spectrometer. In ETD, analyte (peptide) ions are allowed to react with a radical anion. The radical anion transfers an electron to the multiply charged peptide ion resulting in a charge-reduced radical cation. Transfer of the radical ion induces a fragmentation cascade in the peptide, predominantly cleaving the N-Ca bond on the peptide backbone resulting in c- and z-type ions (Figure 7.4). One of the advantages of ETD is the retention of PTMs on the fragment ions . ETD fragmentation is less efficient than CID, and it therefore takes longer to produce mass spectra of sufficient quality for peptide identification. It is possible to improve the fragmentation efficiency in ETD by collisionally activating the ions postfragmentation (supplemental activation) . This process breaks any noncovalent bond holding pairs of fragments together (but no further covalent bonds). The activation time for linear ion-trap CID is usually 10-20 ms, whereas ETD in the same ion trap will often be 80-150 ms .
A typical LC-MS/MS analysis of a proteome will result in several thousand MS/MS spectra. It would be impossible to manually assign each of these to a peptide (and subsequently to a protein), and therefore there are multiple search algorithms available that perform this task. There are two main types of spectral analysis: de novo sequencing  and protein database searching . De novo sequencing does not require any prior knowledge of the sample and identifies peptide sequences by matching the difference in mass of ions in the fragmentation spectrum to combinations of amino acids. Combined with the accurate mass of the precursor ion, they are used to identify the most likely amino acid sequence. Protein database search algorithms such as Mascot  or SEQUEST  work via a multistep process. First, the program produces an in silico digest of the proteins within the database using the same enzyme as the experiment. The masses of these in silico peptides are then matched against the precursor masses from the proteomics analysis. The top n closest mass matches between the in silico and the precursor masses are retained. The program then produces a synthetic MS/MS spectrum, which it matches against the experimental MS/MS spectrum. This match and the accuracy of the intact peptide are given a probability score in order to assess whether it is a likely true match or a random match. In both database searching and de novo sequencing, it is possible to further inform the search with various parameters. For example, the potential number of missed cleavages in the digestion is often used in searches of samples that are post-translationally modified, since PTMs can hinder full proteolysis of a protein . The mass accuracy of the mass spectrometer is also an important parameter. The closer the experimental and theoretical masses, the higher the score. Another parameter in all search algorithms is the possibility of PTMs on the peptide. The addition of a PTM results in a mass increase or decrease in both the precursor and any fragment ions, which contain the modified residue. This possibility is taken into account in both de novo and database searches. The software creates a modified peptide every time the specified amino acid is observed. The mass increase/decrease observed on the fragment ions is used to localize the site of modification.