Mascot: The trusted reference standard for protein identification by mass spectrometry for 25 years

A history of Mascot and Mowse

One of the first programs for identifying proteins by peptide mass fingerprinting, MOWSE, developed out of a collaboration between Darryl Pappin (Imperial Cancer Research Fund, UK) and Alan Bleasby (SERC Daresbury Laboratory, UK) [Pappin, 1993]. The name chosen was an acronym of Molecular Weight Search. The MOWSE databases were fully indexed so as to allow very rapid searching and retrieval of sequence data. This approach was inspired by the speed of Alan Bleasby’s DELPHOS package, a structured query language developed alongside the OWL non-redundant protein sequence database [Bleasby, 1990]. A second important feature of MOWSE was the scoring algorithm. Although several groups independently developed peptide mass fingerprint packages at around the same time, MOWSE was the first to take account of the non-uniform distribution of peptide sizes which result from digestion by an enzyme. The original method of submitting a search to MOWSE was by means of an email server.

The next major development, MOWSE II, was the addition of amino acid sequence and composition qualifiers [Pappin, DJC, Rahman, D, Hansen, HF, Bartlet-Jones, M, Jeffery, W and Bleasby, AJ, Chemistry, mass spectrometry and peptide-mass databases: Evolution of methods for the rapid identification and mapping of cellular proteins, Mass Spectrom. Biol. Sci., 135-150 (1996)]. This was made available in 1994 as a CGI program on the UK Human Genome Mapping Project web server, but has since disappeared. MOWSE II was fast and offered some unique functionality. However, it still required indexed molecular weight databases to be constructed prior to searching. The drawback of this approach was that a database had to be built for each new enzyme and for each set of amino acid residue masses. This made it difficult to support searching proteins in which residues had been chemically or post-translationally modified, since a new database was required for each combination of modifications.

In 1997, the decision was made to restructure MOWSE so as to compute mass values directly from FASTA sequence databases, "on the fly". This removed the limitations on modified residues, but necessitated a complete rewrite of the computer code. David Perkins, working in Darryl’s group at ICRF, started this work in late 1997. Like Darryl and Alan, David came from Professor John Findlay’s group at Leeds University, where he worked on the continuing development of OWL and new methods for protein sequence analysis and structure visualisation.

Darryl Pappin and David Perkins

David Perkins (left) and Darryl Pappin (right) outside ICRF in May 1999

From the outset, the new algorithms were coded for parallel execution on multiprocessor platforms. One additional feature of the new code was the facility to specify selected MS/MS fragment ion masses as an "ions" qualifier to a peptide mass value. This turned out to work very well, and it became clear that it was only a small additional step to support the searching of raw MS/MS peak lists, something which had only previously been possible using the Sequest program from John Yates and Jimmy Eng [Eng, 1994]. At this stage, MOWSE III was only available within ICRF. It supported all the proven methods of protein identification: peptide-mass fingerprint, MS/MS fragment ion search, and searches which combined mass data with amino acid sequence or composition. It performed all the necessary calculations on the fly, so that it could search any FASTA format database. And, it used a powerful scoring algorithm based on true enzyme kinetics.

In mid-1998 it was decided that a collaboration with an external bioinformatics company was the fastest route to distributing MOWSE to a wider audience. Matrix Science secured a licence from ICRF to develop and distribute MOWSE, although a name change was suggested to avoid confusion with the earlier public domain versions. The name chosen was MASCOT, and considerable work went into porting MASCOT to a variety of platforms (SGI, SUN, DEC, and Windows NT), structuring for fully automated, high throughput protein identification, and documentation. Free and unrestricted access to MASCOT has been available on the Matrix Science web site since early 1999. The company only seeks a licence fee from users who want to run MASCOT on their own server. Matrix Science works closely with Darryl and David to continue adding new functionality to MASCOT.