Posted by John Cottrell (December 12, 2017)

A quarter-century in 2018

That a protein could be identified from the masses of the peptides obtained on its digestion with a specific protease was recognised semi-independently by several research groups. An example of morphic resonance? Or, an idea whose time had come? Most likely, it was just one of many ideas that floated around within the small community studying proteins and peptides by mass spectrometry in the late 1980s and early 1990s. The upshot was five publications in the first half of 1993. In order of submission date:

  • Mann, M., P. Hojrup, et al. (1993). "Use of mass spectrometric molecular weight information to identify proteins in sequence databases." Biol Mass Spectrom 22(6): 338-345 (January 29th)
  • Henzel, W. J., T. M. Billeci, et al. (1993). "Identifying proteins from two-dimensional gels by molecular mass searching of peptide fragments in protein sequence databases." Proc Natl Acad Sci U S A 90(11): 5011-5015 (February 23rd)
  • Yates, J. R., 3rd, S. Speicher, et al. (1993). "Peptide mass maps: a highly informative approach to protein identification." Anal Biochem 214(2): 397-408 (March 26th)
  • Pappin, D. J. C., P. Hojrup, et al. (1993). "Rapid identification of proteins by peptide-mass fingerprinting." Curr. Biol. 3(6): 327-332 (April 20th)
  • James, P., M. Quadroni, et al. (1993). "Protein identification by mass profile fingerprinting." Biochem Biophys Res Commun 195(1): 58-64 (June 21st)

Of course, all these groups must have been conducting experiments and writing code well in advance of submitting a manuscript, so these dates do not tell us who was first with the idea. The earliest unambiguous description of the method seems to be a poster presentation from the Genentech group at the 3rd Symposium of the Protein Society in July 1989. (M179: A novel approach for identifying proteins: molecular ion searching of protein databases). This was cited in three of the papers, while a fourth mentioned ‘W. Henzel, personal communication’. Some of the Genentech researchers later wrote a nice history of protein identification which cited a 1977 paper from Laemmli and colleagues, describing gel-based fingerprinting of partially digested proteins. Some similarity, perhaps, but missing the break-through innovation of PMF, which was that the experimental data were being compared with values calculated from a sequence database. If peptide mass fingerprinting relied on matching an experimental mass spectrum to a library spectrum, it would never have got off the ground.

As is often the case, the importance of PMF was not recognised immediately. The Genentech 1989 poster was not one of those selected for the published proceedings and there doesn’t appear to have been a patent application. In fact, there is little evidence of activity prior to the 1993 papers other than another conference presentation at the 5th Symposium of the Protein Society in 1991 from the Yates lab. – T2: Protein database searching with mass spectral information – not selected for the published proceedings.

Peer reviewed publications describing the use of MS/MS for peptide identification followed soon after those for PMF. In June 1994, the landmark paper from Eng, McCormack and Yates that described Sequest. This had been preceded by a patent application in March 1994. In September 1994, a paper describing the sequence tag approach from Mann and Wilm; a half-way house between de novo interpretation and database search. Since this appeared after the Sequest paper, the authors emphasised the error-tolerant aspect of tag-based searching, which allows modified or variant peptides to be matched.

The Mann and Wilm paper claims that the concept of searching sequence databases by MS/MS data was first described by Mann in a presentation at the 7th Symposium of the Protein Society in 1993. Again, spot the pattern, this presentation was not selected for the published proceedings, but a paper from Roepstorff and Richter, submitted in 1991, when Mann was in the Roepstorff lab., teeters on the brink of a description: "In a similar way [to peptide mass fingerprinting] it may be possible to relate partial sequence information derived for example from MS-MS experiments to database information …"

If we count from the 1993 publications, next year will be the 25th anniversary of database search. A good opportunity for we greybeards to reminisce about the days before Marc Wilkins coined the term proteome; when SwissProt contained only 30k entries and no genomes had been sequenced; when MALDI and electrospray were becoming mainstream but many were still using Fast Atom Bombardment on magnetic sector instruments or Plasma Desorption with Time-of-Flight.

Matrix Science is (slightly) more youthful. 2018 will be the 20th anniversary of the company and this web site, which went live in 1998. Mascot has its origins in Mowse, the search engine developed in the Pappin lab, as described here.

