Data on sites of acetylation and methylation have been accrued and are continuing to be generated. Key challenges in this area are provision of database search tools for confident site and PTM assignment from mass spectrometry and accurate in silico prediction of PTM status. In terms of biological research, a key requirement for taking this forward is to use this information to gain understanding of the functions of these PTM sites, both individually and in combination. The challenges are to effectively mine proteomic data, catalog, and interrogate these multispecies PTM catalogs, including analysis of quantitative changes over time or in different biological states where these data are available. These will inform (i) follow-up experiments on selected proteins and (ii) computational prediction models to complement MS-based approaches for full definition of the acetylomes and methylomes.
Assigning Acetylation and Methylation Status
Database search engines process product ion spectra for peptide identification and were originally designed for data acquired via DDA, where individual tandem mass spectra are interpreted. Peptide identification is achieved by comparing the tandem mass spectra derived experimentally with theoretical tandem mass spectra generated by in silico digestion of a protein database using a number of different search algorithms (as reviewed by Noble and MacCoss ). Protein inference is accomplished by assigning peptide sequences to proteins, grouped on the basis of the assigned peptides being unique or shared with other proteins. Example search engines are Mascot, SEQUEST, PEAKS DB, ProteinPilot, pFind, Andromeda in MaxQuant, OMSSA in COMPASS, and X!Tandem. Assigning PTM status and localization, particularly when associated with missed cleavage peptides, as is the case for acetylated and methylated peptides, poses specific challenges , including a combinatorial increase in the search space, which impacts on the false discovery rates for site assignment (FDR), an issue discussed in detail by Fu and Wong using phosphorylation, carbamylation, and acetylation as reference PTMs .
Error-tolerant searching applies a “two-round-search” and is utilized in Mascot and X!Tandem software [115, 116] to identify proteins from unmodified peptides, and then identify post-translationally modified peptides. PTMTreeSearch (a plug-in to X!Tandem) employs a two-round peptide identification strategy analogous to X!Tandem and Mascot, where the first round is used to reduce the search space to likely solutions followed by an error- tolerant, more exhaustive search in the second round. A computational tree is created for each peptide, whereby the path from the root to the leaves is labeled with the amino acids of the peptide and branches represent PTMs . The error-tolerant approach is limited by the requirement that the post- translationally modified protein must be assigned at least one high-scoring peptide in the first pass. A pragmatic approach, using iterative searching for unmodified, individually modified, and multiple modifications has been exemplified for histone tails. In this study, was achieved by accepting only identifications obtained in common from multiple search engines and applying <1% FDR .
MS analysis of spectra acquired via DIA is inherently limited without direct assignment of MS/MS fragments to precursor ions. Fragments can be grouped and assigned to a specific precursor based on LC retention time alignment. This poses a challenge for coeluting peptides, which can be ameliorated by the use of enhanced-resolution IMS in high-definition MS analysis (HDMSe) or by using a SWATH-like approach to reduce complexity by using narrow mass windows. Software developments in this area include not only new algorithms but also modifications to existing packages (see review by Szabo and Janaky ). Of note, the DIA-Umpire software package processes DIA data using traditional database search tools . Taking a peptide-centric approach to DIA data enables a conservative test to determine the presence and absence of specific query peptides using a library search strategy .