Help > Mascot search parameters

Mascot search parameters

Your name; Email

With an in-house Mascot Server, use of these fields is optional.

On the free, public Mascot server, your name and email address must be entered in these fields. This information will not be used by us or anyone else to send you "spam" or junk mail. The reason for requiring this information is to allow the results of a search to be returned by email. Usually, search results are returned promptly to your browser window. However, if your connection to the web site is broken before the search is complete, they will be emailed to the supplied address. If you become disconnected from the site after submitting a search, please do not resubmit the search, just check your email. This facility also means that you don’t have to wait for search results if you don’t want to, particularly during peak hours when the response may be slower than normal.

To save you having to type in this information for every search, your browser will attempt to save it as a local “cookie”. If you refuse to accept this cookie, or your browser doesn’t support cookies, the information cannot be saved and you will have to type it in for every search. If you change the contents of either of these fields, the new values will be saved when the search is submitted.

Search title

A text string which will be printed at the top of results report pages. Can be left blank.

Database

Select the database(s) to be searched. These can be Fasta files containing amino acid (AA) or nucleic acid (NA) sequences or spectral library (SL) files.

Descriptions of the databases available on the free, public Mascot Server can be found under Mascot search overview.

For a Peptide Mass Fingerprint search, only AA databases are available. It makes no sense to search a set of peptide masses against EST because the entries are just short stretches of sequence, not complete proteins.

For a Sequence Query or an MS/MS Ions Search, on the free, public Mascot Server, you must search one of the protein databases before searching an EST database. If the protein database search fails to produce a positive match, the master results page will allow you to repeat the search against an EST database.

You can multi-select more than one database for a search. This is useful when you want to search a single organism database and include sequences of common contaminants, such as BSA and trypsin.

Taxonomy

The Taxonomy parameter allows searches to be limited to entries from particular species or groups of species. This can speed up a search, and ensures that the hit list will only contain entries from the selected species. If the search data are marginal, and you are completely confident of the origin of the protein, this can help bring a weak match to the top of the list.

The top level classification, All entries, is self-explanatory. Beneath this are a number of classifications representing taxons or species, such as Rodentia (Rodents). The three classifications below Rodentia are Mus, Rattus, and Other rodentia. Selecting Other rodentia would limit a search to Rodentia excluding Mus and Rattus.

The unclassified level contains database entries for which the species is undefined or is a species which doesn’t fit into any current classification. There are about 50,000 such sequences in the NCBIprot database.

The Species information unavailable level contains those database entries from which Mascot was unable to extract taxonomy information. Taxonomy information may be present in the entry, but Mascot was unable to find it. Thus, if a search limited to a more selective classification than All entries fails to give a result, it may be a wise precaution to repeat it against Species information unavailable.

For non-redundant databases, a single entry may represent identical sequences from multiple species. The accession string and title text from the FASTA entry, listed on the master results page, will usually describe just one of these entries. To see the equivalent entries, and to explore their taxonomy, follow the accession number link in the results list to the Protein View. If the hit is from a non-redundant database, and represents multiple entries with identical sequences, the Protein View will include links to NCBI Entrez and the NCBI Taxonomy Browser for all equivalent entries.

Monoisotopic or Average

Earlier versions of Mascot allowed specifying whether the experimental mass values are average or monoisotopic. The current version defaults to Monoisotopic, as every modern instrument and peak picking software defaults to monoisotopic. The mass accuracy help page explains the difference.

Modifications

Select any known or suspected modifications.

Mascot supports two types of modification. Fixed modifications are applied universally, to every instance of the specified residue(s) or terminus. There is no computational overhead associated with a fixed modification, it is simply equivalent to using a different mass for the modified residue(s) or terminus. For example, selecting Carboxymethyl (C) means that all calculations will use 161 Da as the mass of cysteine.

Variable modifications are those which may or may not be present. Mascot tests all possible arrangements of variable modifications to find the best match. For example, if Oxidation (M) is selected, and a peptide contains 3 methionines, Mascot will test for a match with the experimental data for that peptide containing 0, 1, 2, or 3 oxidised methionine residues.

Variable modifications can be a very powerful means of finding a match, but there are also dangers to be aware of. Even a single variable modification will generate many possible additional peptides to be tested. More than one variable modification causes the number of arrangements to increase geometrically. This means that a search can take dramatically longer than the same search with fixed modifications. More importantly, testing all possible arrangements of modifications generates many more random matches, so that discrimination can be sharply reduced.

The best advice is to use variable modifications sparingly; never select a large number "just in case". Mascot has a soft limit of up to 9 variable modifications but, in most cases, a better approach is to do a first pass search with a small number of variable modifications followed by an error tolerant second pass search to pick up additional matches to peptides containing unusual modifications.

If chemically inconsistent fixed modifications are combined, an error message will generated by the search engine.

The ‘Show all mods.’ checkbox switches between a short list of the most common modifications and a complete list of all available modifications. The default state for this checkbox, and all search form fields, is set using the search form defaults page.

Automatic second pass search of selected modification classes

Enable this setting to perform an automatic error tolerant search, which is a two-pass search. A standard, first pass search is performed using the search parameters specified in the form. From the results of the first pass search, all of the database entries that contain one or more significant peptide matches are selected for an error tolerant, second pass search.

In the second pass, Mascot searches selected database entries with relaxed enzyme specificity, while iterating through a comprehensive list of chemical and post-translational modifications, together with a residue substitution matrix. At the completion of the second pass search, a single report is generated, combining the results from both passes.

If the search is submitted as an automatic target-decoy search, the target FDR determines the set of proteins selected for the second pass search. The target is applied independently to the results from each pass, since the significance thresholds may be very different. If target FDR is unset, the default significance threshold 0.05 is used.

By default, the second pass iterates through every modification in the Unimod database, which is well over 2000 modifications. The search space can be restricted to selected modification classes. For example, if you are only looking for post-translational modifications, deselect all other modification classes and keep only “Post-translational”, which is less than 300 modifications. The second pass search will run faster and the search is more sensitive, as the search space is a lot smaller.

Precursor

This field is only shown if you submit a search using one of the obsolete data file formats. By default, these formats are no longer selectable in the search form. If you have an in-house Mascot licence, they can be re-enabled by changing a configuration option.

Certain data file formats, SCIEX API III, PerSeptive (.PKS), and Bruker (.XML), do not include m/z information for the precursor peptide. For these formats only, the Precursor field is used to specify the m/z value of the parent peptide. The charge state is defined by the setting of the Peptide Charge field.

Protein Mass

For Peptide Mass Fingerprint searches only.

The mass of the intact protein in Da applied as a sliding window. That is, the mass of the contiguous stretch of sequence which contains all of the matched peptide mass values. This will generally be less than the mass of the entire sequence entry. If this field is left blank, there is no restriction on protein mass.

Peptide tol. ±

The error window on experimental peptide mass values, (not the error window for MS/MS fragment ion mass values, which is set using the MS/MS tol. ± parameter).

Units can be selected from:

%	fraction expressed as a percentage
mmu	absolute milli-mass units, i.e. units of .001 Da
ppm	fraction expressed as parts per million
Da	absolute units of Da

# ¹³C

Sometimes, peak detection chooses the ¹³C peak rather than the ¹²C. In extreme cases, it may pick the ¹³C₂ peak. The normal test for a precursor match is:
TOL > absolute(exp – calc)
Assuming the mass values and tolerance are in Da, if this field is set to 1, the test will also succeed for
TOL > absolute(exp – calc – 1)
If this field is set to 2, the test will succeed for the above two conditions, plus:
TOL > absolute(exp – calc – 2)

This means that you can use a tight mass tolerance and still get a match to a ¹³C peak. If you are using a very high accuracy instrument, note that the precise shifts are the carbon isotope spacings of 1.00335 and 2.00670, rather than 1 and 2.

MS/MS tol. ±

Error window for MS/MS fragment ion mass values. Units can be ppm, Da or mmu, as above

Mass values

Specifies whether experimental peptide mass values in a peptide mass fingerprint search include the mass of the charge carrier, MH⁺ or M-H^-, or whether they correspond to neutral, M_r values.

Peptide charge

This field is only relevant with a sequence query. For MS/MS search, the field is only shown if you submit a search using one of the obsolete data file formats. By default, these formats are no longer selectable in the search form. If you have an in-house Mascot licence, they can be re-enabled by changing a configuration option.

When the data file format does not specify peptide charge states, the peptide charge parameter is used to specify the precursor charge state in a sequence query or an MS/MS ions search. The peptide mass value supplied in an MS/MS data file is usually an observed m/z value. The charge state field is used to calculate the relative molecular mass (M_r) of the precursor from the observed m/z unless the data file explicitly specifies a different charge state.

N.B. The notation "1+", "2+", etc. is used to save space and because some HTML form fields do not support the use of superscripts and subscripts. "1+" always means MH⁺, "1-" always means M-H^-, "2+" always means MH₂⁺⁺, etc.

For electrospray data, select "2+" if the peptide m/z data are known to be doubly charged. If the charge state is uncertain, select "2+ and 3+" to include both charge states in the search and see which most clearly discriminates the score of the top matched protein.

For MALDI-PSD, the precursor peptides will generally be MH⁺, so the charge state should be set to "1+".

Missed Cleavages

Setting the number of allowed missed cleavage sites to zero simulates a limit digest. If you are confident that your digest is perfect, with no partial fragments present, this will give maximum discrimination and the highest score.

If experience shows that your digest mixtures usually include some partials, that is, peptides with missed cleavage sites, you should choose a setting of 1, or maybe 2 missed cleavage sites. Don’t specify a higher number without good reason, because each additional level of missed cleavages increases the number of calculated peptide masses to be matched against the experimental data. If the actual digest does not contain extended partials, this simply increases the number of random matches, and so reduces discrimination.

Data file

Browse to a peak list file which will be uploaded when the search is submitted. Details of supported file formats can be found here.

Data file URL

Enter the URL to a peak list file. This option is only available if Mascot security is enabled, when the security settings specify which protocols are available (http, ftp, file) and the maximum permitted file size (MB).

Query

For a Peptide Mass Fingerprint, unless a peak list file is specified, the query window must contain a list of peptide mass values, one per line. An intensity value after the mass value is optional. Anything after the second numeric value on each line is ignored.

If intensity information is available, values will be selected according to their intensity so as to get the best score. This can be disabled by setting IteratePMFIntensities to 0 in mascot.dat

For a Sequence Query, each line entered into the query window must consist of one experimental peptide mass value, optionally followed by qualifiers for that peptide:

M seq(…) comp(…) ions(…) tag(…) etag(…)

M is an experimental mass value, seq(…) is AA sequence information, comp(…) is AA composition information, ions(…) contains MS/MS fragment mass and (optionally) intensity values, tag(…) is a sequence tag, etag(…) is an error tolerant sequence tag.

A line may contain zero, one, or many qualifiers. If there are multiple sequence tag qualifiers, and one or more is error tolerant, then all tags are treated as error tolerant.

N.B. ions(…), tag(…), and etag(…) qualifiers are scored probabilistically. That is, the more qualifiers that match, the higher the score, but all qualifiers are not required to match. In contrast, seq(…) and comp(…) are treated as filters. If a seq(…) or comp(…) qualifier fails to match, then the entire query is discarded. Hence, only include seq(…) or comp(…) qualifiers which are known with a high degree of confidence. Note that using a seq(…) qualifier in a Mascot search is not equivalent to a performing a Blast search.

If you re-Search a Sequence Query from the results page, you may notice two additional qualifiers which are used by Mascot internally: from(…) and title(…).

Target FDR

This parameter specifies the target false discovery rate for searches where automatic decoy has been selected. A setting of (no target) corresponds to a value of 0 and means that search results are reported using the default significance threshold setting, usually 0.05. If a target value is selected, the significance threshold in the result report will be adjusted to achieve the target. If it is not possible to get within a factor of 2 of the target, a warning will appear in the report.

If Error tolerant has also been selected, the target FDR determines the set of proteins selected for the second pass search. The target is applied independently to the results from each pass, since the significance thresholds may be very different.

Refine results with machine learning (Percolator)

When you enable this setting, Mascot runs Percolator at the end of the database search. Percolator is an algorithm that uses semi-supervised machine learning to improve the discrimination between correct and incorrect spectrum identifications. The matches from searching a decoy database provide the negative examples for the classifier, and a subset of the high-scoring matches from the target database provide the positive examples.

Percolator will usually give a worthwhile improvement in sensitivity. Mascot calculates several features (metrics) from peptide matches that provide context in addition to MS/MS fragmentation evidence. For example, this includes: precursor mass error, fragment mass error, number of variable modifications, etc.

Mascot also ships with MS2Rescore, a modular and user-friendly platform for AI-assisted rescoring of peptide identifications. MS2Rescore includes two prediction systems, DeepLC and MS2PIP.

Select a DeepLC model for retention times based on your LC and enzyme. At the end of the database search, Mascot uses DeepLC to predict the retention times of target and decoy peptide matches based on sequence, fixed/variable modifications and charge state. The difference between observed and predicted retention time is used as a Percolator feature. Predicted retention time provides information about peptides that is orthogonal to the other metrics.

Select an MS2PIP model for spectral similarity based on your instrument type and enzyme. At the end of the database search, Mascot uses MS2PIP to predict the MS/MS fragmentation spectra of target and decoy peptide matches based on sequence, fixed/variable modifications and charge state. Several correlation metrics are calculated between the observed and predicted spectra and these are used as Percolator features. Spectral correlation typically enhances sensitivity, because it makes greater use of peak intensity than the Mascot ions score.

Precursor removal

The precursor peak can often have very high intensity relative to the fragment peaks, which may give rise to spurious fragment ion matches. It is usually best if the precursor is removed before the search.

With the default arguments of -1,-1, a smart filter is created. This removes peaks within the fragment ion tolerance window about each of the precursor isotope peaks. The number of isotopes is assumed to be as follows:

Mr	Number
< 1000	3
1000 – 1999	4
2000 – 2999	5
3000 – 3999	6
4000 – 4999	7
5000 – 5999	8
6000 – 6999	9
> 7000	10

So, if the precursor m/z was 800, the charge was 2, and fragment ion tolerance was +/- 0.1 Da, the filter would remove 4 notches of width

m/z 800.0 +/- 0.1
m/z 800.5 +/- 0.1
m/z 801.0 +/- 0.1
m/z 801.5 +/- 0.1

At first sight, this may seem a strange mix of m/z and Da. The reason is that we need to avoid matches from 1+ fragment ions, whatever the charge on the precursor. If the arguments are anything other than -1,-1, a single notch is used where the first argument is the mass offset of the beginning of the notch and the second value is the mass offset of the end of the notch. For the precursor in the last example, if the arguments were -1,4 then the notch would run from m/z 799.5 to m/z 802.0. However, if the precursor charge was 1, then the notch would be from m/z 799 to m/z 804.

Instrument

For an MS/MS Ions Search, choose the description which best matches the type of instrument used to acquire the data. This setting determines which fragment ion series will be used for scoring, according to the following table. "Default" corresponds to the configuration used in Mascot version 1.7 and earlier.

	Default	ESI QUAD TOF	MALDI TOF PSD	ESI TRAP	ESI QUAD	ESI FTICR	MALDI TOF TOF	ESI 4 SECT	FTMS ECD	ETD TRAP	MALDI QUAD TOF	MALDI QIT TOF	MALDI ISD	CID+ETD	ETchD	EAD
1⁺	X	X	X	X	X	X	X	X	X	X	X	X	X	X	X	X
2⁺ (precursor>2⁺)	X	X		X	X	X		X	X	X	X			X	X	X
2⁺ (precursor>3⁺)
imm.			X				X	X			X	X
a	X		X				X	X				X	X		X	X
a*	X		X				X					X			X
a0			X				X					X			X
b	X	X	X	X	X	X	X	X			X	X		X	X	X
b*	X	X	X	X	X	X	X	X			X	X		X	X
b0		X	X	X	X	X	X	X			X	X		X	X
c									X	X			X	X	X	X
x
y	X	X	X	X	X	X	X	X	X	X	X	X	X	X	X	X
y*	X	X		X	X	X	X				X	X		X	X
y0		X		X	X	X	X				X	X		X	X
z								X
z+1									X	X				X	X	X
z+2									X	X			X	X	X	X
yb							X	X			X	X
ya							X	X			X	X
y must be sig.
y must be highest
d							X
v							X
w							X							X	X	X

Other Parameters

There are a number of other search parameters, but their default settings should not be changed under normal circumstances. For this reason, they are not accessible from the browser interface. The defaults can be over-ridden by using embedded parameters, either in a data file or in the query window. But, be warned that you change them at your own risk!

Matrix Science