Matrix Science
Home What's New Mascot Help Products Support Training Contact  
   
  Help > Sequence Database Setup > Contaminants  
 
 

Sequence Database Setup: Contaminants

Overview

If you search a single organism database, its usually a good idea to include sequences for common contaminants, such as keratins, BSA, and trypsin.

Two groups make their collections available for download. The Max Planck Institute of Biochemistry, Martinsried, maintains a file of some 248 proteins selected from various sources. The Global Proteome Machine Organization common Repository of Adventitious Proteins contains some 112 proteins selected from Swiss-Prot. (Numbers as of October 2011).

In Mascot 2.3, you simply select the contaminants database in the search form, along with the target database. For Mascot 2.2 and earlier, you need to append the contaminant sequences to the end of the target database fasta file. This can be complicated by the requirement to have a uniform syntax for all the title lines. One database may have Swiss-Prot style accessions and the other NCBI-style accessions. If so, you either have to find a parse rule that works with both or modify the title lines of one database using a script or text editor. If both target and contaminants databases have accessions drawn from the same pool, remember to watch for duplicates. It may be safer to leave the CON_ prefix in place for the MPI collection, or add a prefix for the GPM collection.

Download

http://maxquant.org/contaminants.zip for contaminants from MPI
ftp://ftp.thegpm.org/fasta/cRAP/crap.fasta for cRAP from GPM

Taxonomy

Taxonomy is not appropriate. You want to include all contaminants in every search.

Parse Rules

Fasta title lines in the MPI collection vary according to the source database. Use standard rule 4 for the accession and standard rule 5 for the description.

Fasta title lines in the GPM collection contain only a SwissProt accession. Use standard rule 4 for both accession and description.

Configuration

The MPI collection was downloaded to C:\inetpub\mascot\sequence\contaminants\current, decompressed using gzip, and renamed to contaminants_20100513.fasta.

Mascot database maintenance utility

The GPM collection was downloaded to C:\inetpub\mascot\sequence\cRAP\current, and renamed to cRAP_20100324.fasta.

Mascot database maintenance utility

Always test a new definition before applying the changes to mascot.dat.

 
 
Copyright © 2011 Matrix Science Ltd. All Rights Reserved.