Sequence database setup: Contaminants

The configuration information on this page is maintained as a service to users of Mascot 2.3 and earlier. In Mascot 2.4, both contaminants collections are predefined databases, meaning up-to-date configuration information can be downloaded automatically by Mascot Database Manager.


If you search a single organism database, its usually a good idea to include sequences for common contaminants, such as keratins, BSA, and trypsin.

Two groups make their collections available for download. The Max Planck Institute of Biochemistry, Martinsried, maintains a file of some 248 proteins selected from various sources. The Global Proteome Machine Organization common Repository of Adventitious Proteins contains some 112 proteins selected from Swiss-Prot. (Numbers as of October 2011).

In Mascot 2.3 and later, you can simply select the contaminants database in the search form, along with the target database. For Mascot 2.2 and earlier, you need to append the contaminant sequences to the end of the target database fasta file. This can be complicated by the requirement to have a uniform syntax for all the title lines. One database may have Swiss-Prot style accessions and the other NCBI-style accessions. If so, you either have to find a parse rule that works with both or modify the title lines of one database using a script or text editor. If both target and contaminants databases have accessions drawn from the same pool, remember to watch for duplicates. It may be safer to add a prefix to the accessions of the contaminants entries so as to avoid possible collisions.

Taxonomy is not appropriate. You want to include all contaminants in every search.

Parse Rules

Fasta title lines in the MPI collection vary according to the source database. Use standard rule 4 for the accession and standard rule 5 for the description.

Fasta title lines in the GPM collection contain SwissProt ID and no description. Use standard rule 4 for both accession and description.

Configuration (Mascot 2.3 and earlier)

The MPI collection was downloaded to C:\inetpub\mascot\sequence\contaminants\current, decompressed using gzip, and renamed to contaminants_20100513.fasta.

The GPM collection was downloaded to C:\inetpub\mascot\sequence\cRAP\current, and renamed to cRAP_20100324.fasta.

Always test a new definition before applying the changes to mascot.dat.