There is a wide choice of software to convert Xcalibur data into peak lists and submit these to a Mascot Server for searching. This page lists some of the more widely used options. Remember that the spectra in Xcalibur raw files may contain centroid data, rather than profile data. With hybrid instruments, it is common for the high resolution survey scans to be profile and lower resolution MS/MS scans to be centroid.
Mascot Distiller can be used to browse Xcalibur raw files, and process them into high quality peak lists that can be saved or submitted direct to a Mascot Server for searching. With the appropriate Distiller Toolboxes, the search results can be imported back into Distiller for further examination or used as the basis for quantitation. If the optional Mascot Daemon Toolbox is installed, these processes can be automated using Mascot Daemon.
If your MS/MS data is centroided, you can choose to create a peak list direct from the centroid values already present in the raw file. This is extremely fast and the peak list is fine for most purposes. Choose extract_msn.opt as the processing options when opening the raw file as a new project.
With high resolution data from an FT or Orbitrap, you may wish to take a little longer, and peak pick the survey scans, so as to obtain more reliable detection of the 12C peaks. For high charge state data, you may wish to peak pick the MS/MS scans so that the peaks can be de-isotoped and de-charged. Mascot only tries to match 1+ and 2+ fragments, so de-charging to 1+ becomes important when the precursor is 4+ or higher. Full details of how to select and modify the processing options can be found in the Distiller help file, (see especially the ‘More about peak picking’ topic in the Reference chapter).
Besides the quality of the peak lists, the other advantages of using Mascot Distiller are that it provides a universal interface to other raw file formats, and it is fully integrated with Mascot Server and Mascot Daemon. You can, for example, use Mascot Daemon to process batches of files automatically, saving Distiller project files that contain both the peak lists and the search results.
Mascot Daemon can be used to process batches of RAW files by choosing either Mascot Distiller or Thermo ExtractMSn as the data import filter.
Mascot Distiller is the more powerful option, and this is the required route if you intend to use Distiller for quantitation. Distiller requires the optional Mascot Daemon Toolbox to allow the Distiller libraries to be called from Mascot Daemon. When this toolbox is active, Mascot Distiller will appear automatically on the list of data import filters in Daemon.
If you choose Thermo ExtractMSn, Daemon executes ExtractMSn to convert each raw file into a set of DTA files, then merges these into an MGF file. Unlike the ExtractMSn Shell web browser form, Daemon executes ExtractMSn on the Daemon PC, so this option is available even if your Mascot server is on a Unix platform.
If you have a set of DTA format peak lists, but no raw file, you can also use Daemon to merge the DTAs into an MGF file for searching. Select the DTA files in Windows Explorer and drag and drop them into the Daemon data files list box on the Task Editor tab. Then, check the box for Merge MS/MS files into a single search.
By running Mascot Daemon in real-time monitor mode, each RAW file can be searched automatically, as soon as acquisition is complete. First, create a suitable parameter set for the task:
(Note that the file format is Mascot Generic, not DTA, because Daemon data import filters always create MGF files.) Second, create a real-time monitor task to monitor the directory where the RAW files are being created. Remember to select the correct parameter file, and choose either Mascot Distiller or Thermo ExtractMSn as the data import filter.
The data import filter processing options are specified by choosing the Options button next to the data import filter list box. For Distiller, you may have something like this
For Thermo ExtractMSn, these would be typical settings:
In real-time monitor mode, it is important that Mascot Daemon waits until acquisition is complete before processing the RAW file into peak lists. To avoid taking a file while it is still being written, Daemon checks the file size at intervals, and waits until it has stopped increasing. The default interval is 60 seconds, which may not be long enough when the file size grows only slowly. If Daemon tries to process a RAW file before acquisition is complete, increase this interval by going to the Timer Settings tab of the Preferences dialog. Increase the value of ‘File size must be constant for at least’ until the problem disappears.
If Mascot Daemon reports "No output from ExtractMSn" or the ExtractMSn Shell web browser form returns "Must choose at least one query for repeat search" this means that no DTA files were produced. The most common causes are (i) the ExtractMSn parameters are too restrictive, (ii) the data file does not contain MS/MS scans, (iii) the version of ExtractMSn is older than the version of Xcalibur used to create the data file. The easiest way to investigate and debug this problem is to execute ExtractMSn at a command prompt, using identical processing parameters.
If your Mascot server runs under Windows XP, and you get the message "cannot create temporary directory" when you try to use the ExtractMSn Shell web browser form, this may be because the security settings do not allow CGI programs to execute the command processor. A fix is described on the Support page, in the Windows XP section.
If you have a Windows-based Mascot server in-house, you can use the ‘Process Thermo Xcalibur raw file’ form to upload and process the RAW file. When this form is submitted, the processing options are passed to ExtractMSn running on the Mascot server. The RAW file is processed into DTA files which are automatically merged into a single file, pre-loaded into a Mascot search form.
When Mascot is first installed, you need to edit the underlying Perl script (lcq_dta_shell.pl) to specify the locations of a workspace directory and the directory containing the ExtractMSn executable. These are defined by two variables near the top of the script:
# local name of temp directory on Mascot server
my $tempDir = "c:\\temp";
# local path to lcq_dta.exe or extract_msn.exe on Mascot server
my $lcqExe = "c:\\Program Files\\Thermo\\ExtractMSn\\ExtractMSn.exe";
Note the use of double backslashes in the path names.
Support for submitting searches direct to a Mascot Server was added to Thermo’s Bioworks in version 3.2, but we advise using Bioworks 3.3 SP1 to avoid some known issues with the first release. Mascot Server must be version 2.1 or later. In Bioworks browser, choose Configuration off the Options menu. In the dialog, select Mascot Search and enter the Mascot Server URL in the form http://ec-vm2/mascot/cgi where ec-vm2 is replaced by the hostname of your local server.
When a data file is loaded, you can choose Mascot off the Actions menu to submit a search. Bioworks creates and saves an mzData format peak list for submission to Mascot.
When the search is complete, you can load the Mascot results report in a web browser or download the results file to the Bioworks PC. Note that Bioworks has been superseded by Proteome Discoverer, and is no longer available.
Thermo’s Proteome Discoverer provides fully automated raw file processing and search submission. Peak picking and search parameters are selected in a workflow wizard. When the search is complete, the results are imported into Proteome Discoverer, where they can be filtered and inspected.
(Note that the local Mascot Server URL must be entered in the form http://ec-vm2/mascot/ where ec-vm2 is replaced by the hostname of your local server.)
Mascot supports the Sequest DTA peak list format. However, if the data are from an LC-MS/MS experiment, searching individual DTA files is inefficient, and doesn’t allow Mascot to generate a proper results summary. You can concatenate a set of DTA files into an MGF peak list using one of these utilities:
- merge.pl, a Perl script (any platform)
- merge.bat, a DOS batch file (Windows)
- merge.sh, a shell script (Unix)
If possible, you should choose the Perl script, because this creates a Mascot Generic Format (MGF) file in which each DTA file name is preserved as a spectrum title. This makes it easier to compare the Mascot search results with the original data, because you can identify the scan range represented by each spectrum. It also enables the origin of each DTA file to be tracked when data from multiple RAW files from a MudPIT experiment are merged together.
Most Unix systems will already have Perl installed. If your Windows system doesn’t have Perl, it can be downloaded free from ActiveState. (Quote from Bugzilla: "Any machine that doesn’t have Perl on it is a sad machine indeed.")
Note: ExtractMSn is no longer developed, supported or distributed by Thermo. The final version was released in January 2011 and, in general, you cannot process raw files from Xcalibur releases much later than this date.
ExtractMSn does not perform centroiding of profile data. If you generate DTA files from a RAW file containing profile data, the DTA files are themselves profile data. Zero intensity values are dropped, and non-zero intensities are output at 0.1 Da intervals. Mascot deals with this as best it can by performing simple peak detection, but this is less than ideal. The other problem of working with profile data is that the DTA files will be very large, and you may occasionally get a Mascot error message that there are more than 10,000 data points in a single spectrum.
The following are worth noting:
- Intermediate scans (-S): Although it looks like it should be OK to set S to zero, this can sometimes result in no output
- Min. Peaks in DTA (-I): The default is 0, but this should always be set to a sensible number, say 10, to remove empty or near empty scans, since these can never give significant matches in Mascot.
- Precursor Charge (-C): With triple-play data, precursor charge state determination is fairly sophisticated, and the default settings should not be changed. If your data don’t include zoom scans, the code attempts to recognise singly charged precursors, while precursors with higher charge states are output twice, with 2+ and 3+ charge states.
- TIC Threshold (-E): Not described in the Usage information
- Extract MSn (-P): Not described in the Usage information
Mascot supports the DTA format. However, if the data are from an LC-MS/MS experiment, searching individual DTA files is inefficient, and doesn’t allow Mascot to generate a proper results summary. If you have a set of DTA files, it will usually be best to merge them into a single file. If you have Mascot in-house, you can have Mascot Daemon take care of this, automatically.
MSFileReader (download requires registration) is Thermo’s raw file access library. It is required by both Mascot Distiller and ExtractMSn, and is installed automatically as part of a Mascot Distiller installation.
DeconMSn has been developed at Pacific Northwest National Laboratory. It requires Xcalibur and Microsoft .NET 1.1 or later to be installed. It is not clear whether it can be made to run stand-alone, on a system without a full installation of Xcalibur.
DeconMSn can output either DTA or MGF peak lists. With high resolution data, parent monoisotopic mass is calculated using a modified THRASH approach. For low-resolution data, DeconMSn uses a support-vector machine based charge-detection algorithm to determine parent mass.MSConvert is a component of ProteoWizard that converts between various file formats. It has a large number of options and can be configured as a data import filter in Mascot Daemon. Note that the input file format is not specified; msconvert auto-detects the format.
Sequest is a registered trademark of the University of Washington. Xcalibur is a registered trademark and Bioworks and Proteome Discoverer are trademarks of Thermo Electron Corporation.