Posted by John Cottrell (June 27, 2014)

Spring clean your mascot.dat

Updates to Mascot 2.5 will be going out shortly. When an existing installation is updated, settings in the options section of mascot.dat are only modified by the installer in exceptional cases. The assumption is that any changes have been made with good reason and should be honoured. The down-side of this approach is that values may be persisted that are no longer optimal or even mistakes. This section lists a few settings that can have a major impact on performance or reliability. Note that some of these settings first appeared in Mascot 2.3. For additional information on each setting, refer to your Mascot Setup & Installation Manual, which is a PDF linked from your local Mascot home page.

Percolator: Setting this to 1 means reports display Percolated scores by default for all eligible searches. Since there is no guarantee that Percolator will improve every search, it is usually better to leave this at the default of 0 and choose to Percolate individual reports only when it is of interest.

ExecAfterSearch_n: ExecAfterSearch_1 executes ms-createpip.exe and ExecAfterSearch_2 executes percolator.exe, to create the input and output files required to display a report with Percolated scores. This can be time consuming, so you may want to comment out these lines to avoid waiting for the files to be created after every eligible search, as there is no guarantee that anyone will ever want to view the report with Percolated scores. You should not comment out these lines if any of the following apply:

Your Mascot Server is associated with Mascot Insight
If you ever submit searches from Daemon and have Percolator set to 1 in mascot.dat.
If you ever want to work with Percolated scores in Mascot Distiller.

ForkForUnixApache: This should never be set to 1 on a Windows system.

IgnoreDupeAccessions: This should be followed by a list of Mascot database names for very large, public databases, such as NCBInr and UniRef100. It is very risky to include any home-brew databases in the list because accidents will happen. If the database includes duplicate accessions, or appears to include them because of a parse rule error, and the error is not detected, great confusion can result.

IgnoreIonsScoreBelow: Values between 0 and 1 correspond to expect values and values of 1 or greater correspond to scores. Setting this to any non-zero value as a global default can be confusing for users who don’t realise that matches are being filtered from reports. Safer to allow users to set their own cut-off in the format controls at the top of the report.

MaxQueries: Defines the maximum number of spectra in an MS/MS search and, the higher the value, the more memory overhead applies to each search. Very few people run searches larger than 1 million spectra, so best not to set this higher ‘just in case’.

MaxSequenceLen: Defines the maximum length of an individual Fasta sequence and, the higher the value, the more memory overhead applies to each search. If you think you need to set this higher than the default of 50000, think again. Much more efficient to split long sequences into protein-sized, overlapping chunks, as explained in this help page.

MinPepLenInPepSummary, MinPepLenInSearch: Very short peptides cannot get high scores and are more likely to be false positives. Having a low value for both these settings makes the search take longer, makes the result file larger, can reduce sensitivity, and may mess up protein family grouping by over-clustering unrelated proteins into one huge family. The current installation defaults for both settings are 7, and you should not reduce MinPepLenInSearch or set the two to different values without fully understanding the consequences.

MudpitSwitch: Controls whether the default protein scoring for a search is standard or mudpit. Standard protein scoring is only suitable for very small searches, and can give false proteins for large searches. For a Peptide Summary or a Select Sumnmary, you can always change this in the format controls, so little reason to change the global default of 0.001. (The Protein Family Summary is always mudpit scoring.)

ProteinFamilySwitch, ResultsPerlScript_2: A successful search calls ResultsPerlScript if the number of queries is less than ProteinFamilySwitch otherwise ResultsPerlScript_2. If you prefer to view results for large searches using the earlier, Select Summary, you can achieve this by setting ProteinFamilySwitch to a high value or by pointing ResultsPerlScript_2 to master_results.pl rather than master_results_2.pl. However, note that very large searches will fail to load in Select Summary format, or may take up huge amounts of memory on both the Mascot Server and the client where the browser is running. To avoid problems, don’t change ResultsPerlScript_2 and set ProteinFamilySwitch to a high but reasonable value, e.g. 50000, so that you don’t try to load a Select Summary after a really huge search.

SelectSwitch: For searches that call ResultsPerlScript, if the number of queries in an MS/MS search is less than or equal to this number, the default report is the Peptide Summary. If it is greater than this number, the default report is the Select Summary. The Peptide Summmary is even more memory hungry than the Select Summary, so be very careful about setting this to a high value if you have also increased ProteinFamilySwitch

SplitDataFileSize, SplitNumberOfQueries: Large searches are split into chunks for searching to reduce memory requirements, which is essential if the server is 32-bit. For a 64-bit server, even if you have large amounts of RAM, splitting can be important when multiple searches are running simultaneously. Best not to change these settings from the defaults of 10000000 and 1000

Keywords: configuration files, sysadmin

Comments are closed.

Matrix Science

Spring clean your mascot.dat