Sequence Database Setup: Contaminants
Overview
If you search a single organism database, its usually a good idea to include sequences for common
contaminants, such as keratins, BSA, and trypsin.
Two groups make their collections available for download. The
Max Planck Institute of Biochemistry,
Martinsried, maintains a file of some 248 proteins selected from various sources. The Global Proteome Machine Organization
common Repository of Adventitious Proteins
contains some 112 proteins selected from Swiss-Prot. (Numbers as of October 2011).
In Mascot 2.3, you simply select the contaminants database in the search form, along with the target
database. For Mascot 2.2
and earlier, you need to append the contaminant sequences to the end of the target database fasta file.
This can be complicated by the requirement to have a uniform syntax for all the title lines. One database may
have Swiss-Prot style accessions and the other NCBI-style accessions. If so, you either have to find a
parse rule that works with both or modify the title lines of one database using a script or text editor.
If both target and contaminants databases have accessions drawn from the same pool, remember to watch for
duplicates. It may be safer to leave the CON_ prefix in place for the MPI collection, or add a prefix for the
GPM collection.
Download
http://maxquant.org/contaminants.zip
for contaminants from MPI
ftp://ftp.thegpm.org/fasta/cRAP/crap.fasta
for cRAP from GPM
Taxonomy
Taxonomy is not appropriate. You want to include all contaminants in every search.
Parse Rules
Fasta title lines in the MPI collection vary according to the source database.
Use standard rule 4 for the accession and standard rule 5 for the description.
Fasta title lines in the GPM collection contain only a SwissProt accession. Use
standard rule 4 for both accession and description.
Configuration
The MPI collection was downloaded to
C:\inetpub\mascot\sequence\contaminants\current,
decompressed using gzip,
and renamed to contaminants_20100513.fasta.
The GPM collection was downloaded to
C:\inetpub\mascot\sequence\cRAP\current,
and renamed to cRAP_20100324.fasta.
Always test a new definition before applying the changes to mascot.dat.
|