Mascot search overview
Mascot is a powerful search engine which uses mass spectrometry data to identify proteins from primary sequence databases.
While a number of similar programs available, Mascot is unique in that it integrates all of the proven methods of searching. These different search methods can be categorised as follows:
- Peptide Mass Fingerprint in which the only experimental data are peptide mass values, (tutorial)
- Sequence Query in which peptide mass data are combined with amino acid sequence and composition information. A super-set of a sequence tag query, (more information)
- MS/MS Ion Search using uninterpreted MS/MS data from one or more peptides, (tutorial)
The general approach for all types of search is to take a small sample of the protein of interest and digest it with a proteolytic enzyme, such as trypsin. The resulting digest mixture is analysed by mass spectrometry.
Different types of mass spectrometer have different capabilities. A simple instrument will measure a set of molecular weights for the intact mixture of peptides. An instrument with MS/MS capability can additionally provide structural information by recording the fragment ion spectrum of a peptide. Usually, the digest mixture will be separated by chromatography prior to MS/MS analysis, so that MS/MS spectra from individual peptides can be measured.
The experimental mass values are then compared with calculated peptide mass or fragment ion mass values, obtained by applying cleavage rules to the entries in a comprehensive primary sequence database. By using an appropriate scoring algorithm, the closest match or matches can be identified. If the "unknown" protein is present in the sequence database, then the aim is to pull out that precise entry. If the sequence database does not contain the unknown protein, then the aim is to pull out those entries which exhibit the closest homology, often equivalent proteins from related species.
The sequence databases that can be searched on the Matrix Science free, public Mascot server are:
- SwissProt is a high quality, curated protein database. Sequences are non-redundant, rather than non-identical, so you may get fewer matches for an MS/MS search than you would from a comprehensive database, such as NCBInr. SwissProt is ideal for peptide mass fingerprint searches and MS/MS searches of well characterised organisms where it isn’t essential to match every single spectrum.
- NCBInr is a comprehensive, non-identical protein database maintained by NCBI for use with their search tools BLAST and Entrez. The entries have been compiled from GenBank CDS translations, PIR, SWISS-PROT, PRF, and PDB.
- EMBL EST divisions contain "single-pass" cDNA sequences, or Expressed Sequence Tags, from a number of organisms. During a Mascot search, the nucleic acid sequences are translated in all six reading frames. There are 10 divisions: Environmental_EST, Fungi_EST, Human_EST, Invertebrates_EST, Mammals_EST, Mus_EST, Plants_EST, Prokaryotes_EST, Rodents_EST, and Vertebrates_EST.
- contaminants is a database of common contaminants compiled by Max Planck Institute of Biochemistry, Martinsried
- cRAP is a database of common contaminants compiled by the Global Proteome Machine Organization