Matrix Science header

Accessing Distiller Quantitation Results
[Quantitation analysis for Mascot Server and Distiller]

Note: This describes the structure of a single-file project. Projects with multiple files have a different, more complex structure, see Accessing Distiller Multi-file Quantitation Results.

Single-file Quantitation Overview

Distiller stores its parameters and results in a Distiller project (.rov) file.

The project file has the same internal format as a .ZIP file. You can view the contents by opening it with a .ZIP file viewer (e.g. by renaming the file to have a .zip extension and opening as usual for a .ZIP file).

The data can also be extracted using the Mascot Distiller SDK (MDRO), see Distiller projects and the toolkit.

Project file streams

The project file contains a number of streams:

name format contents
mdro_search_status XML A search status list detailing all of the result sets in the project.
mdro_search_status+N MIME The Mascot results data to be used to construct a ms_mascotresfile. N is the id attribute from the search status.
rover_data XML The Distiller project parameters to be used to create a ms_distiller_data.
rover_data+(cac0+N) MIME The peptide summary cache to be used to construct a ms_peptidesummary. N is the id attribute from the search status.
rover_data+(1f40+M) binary The quantitation results file used to populate a ms_ms1quantitation. M is the projectStream attribute from the Distiller project.
rover_data+(bb8+M) binary The quantitation cache file used to populate a ms_ms1quantitation. M is the projectStream attribute from the Distiller project.

Distiller project parameters

The Distiller project parameters are held in the stream rover_data.

The stream name can be retrieved directly from matrix_science::ms_distiller_search_status_list.

std::string projectStreamName = matrix_science::ms_distiller_data::getDistillerProjectStreamName();

This data should be loaded into an ms_distiller_data.

matrix_science::ms_distiller_data distillerData;
std::string roverXmlFilename = zipExtractPath + projectStreamName;
bool optionsLoadedFomXmlOk = distillerData.loadXmlFile(mascotXmlSchemaFolder.c_str(), roverXmlFilename);

It contains a number of parameters required for the ms_peptidesummary and also a list of searches and a list of quantitation results. The schema for this is in distiller_data_1.xsd.

Search list

The search list is held in the stream mdro_search_status.

This list is in XML format. The schema for this XML is in the file mdro_search_status_1.xsd.

The search list is used to determine the id of the results file stream to be loaded (at XPath /mdroMascotSearchStatusStream/searchStatusList/searchStatus/@id ).

There are other attributes that may be of interest; for example the search task id and title.

ms_distiller_search_status_list can be used to parse the file.

std::string searchFilename = zipExtractPath + matrix_science::ms_distiller_search_status_list::getSearchStatusListStreamName();
matrix_science::ms_distiller_search_status_list searchList;
bool ok = searchList.loadXml(searchFilename);

The schema for this is in mdro_search_status_1.xsd.

Mascot results

The mascot results are held in the stream mdro_search_status+N, where N is the id of the search status; for example "mdro_search_status+1". If the id is greater than 9 then it is expressed in lower-case hexadecimal, e.g. an id of 13 would have a stream name of "mdro_search_status+d".

The results file stream name can be retrieved directly from the search status.

std::string resfileStreamName = searchList.getSearch(1).getResfileStreamName();

This file should be loaded by the ms_mascotresfile.

The cache files for the results and the peptide summary will be placed in a folder located in the temporary file folder specified to the ms_mascotresfile constructor. The name of this sub-folder varies with the last update time attribute of the results file and is determined at the time the ms_mascotresfile is constructed. These cache files are not deleted by Parser after creation. In order to discourage unnecessary disk usage by an ever expanding number of files, you should set the last-write-time of the results file. Distiller sets it to 00:00:00 1/1/2010.

time_t time = matrix_science::ms_distiller_data::distillerResfileTimestamp;
matrix_science::ms_fileutilities::setLastModificationTime(pathnameResfile.c_str(), time);

When creating the ms_mascotresfile, the cache should be enabled (to speed up access) and the timestamp within the cache file should be ignored (to avoid timezone issues, this does not affect generation of the folder used for cache files).

int resFileFlags = matrix_science::ms_mascotresfile::RESFILE_USE_CACHE
        | matrix_science::ms_mascotresfile::RESFILE_CACHE_IGNORE_DATE_CHANGE;
matrix_science::ms_mascotresfile * resfile = new matrix_science::ms_mascotresfile(
        pathnameResfile.c_str(),
        0, // keepAliveInterval
        "<!-- %d seconds -->\n", // keepAliveText
        resFileFlags,
        temporaryFileFolder.c_str(),
        mascotXmlSchemaFolder.c_str());

Peptide summary cache

The peptide summary loads a cache file in order to improve access times and ensure consistent results across different versions of Mascot Parser.

The same parameters, available from the Distiller project parameters, are used in the generation of the peptide summary cache filename as are used in the creation of the peptide summary itself, so the same values should be used when invoking each of the two functions.

This is stored in the project file, in the stream rover_data+cac0 + N where N is the id of the search from the search status list; for example "rover_data+cac1". The stream name can be retrieved from the search.

std::string summaryStreamName = searchList.getSearch(1).getPeptideSummaryCacheStreamName();

The peptide summary cache data should be copied to the correct file location for use.

std::string pathnameCachefile = matrix_science::ms_peptidesummary::getCacheFilename(
        *resfile,
        distillerData,
        1); // first search in the list

Once the peptide summary cache file has been copied into position, the peptide summary should be created using exactly the same parameters as were used to generate the filename.

matrix_science::ms_peptidesummary * peptideSummary = new matrix_science::ms_peptidesummary(
        *resfile,
        distillerData,
        1);

Quantitation results

The quantitation data is stored in two streams of the Distiller project file.

The stream number offset is obtained from the first ms_distiller_data_quant in the list in the ms_distiller_data. This offset is added to 3000 for the quantitation data stream number and to 8000 to get the cache stream number. These should then be converted to hexadecimal to give the actual names of the streams; for example, a streamNumber of 2 would give stream names of "rover_data+bba" and "rover_data+1f42".

The quantitation results stream names can be retrieved directly.

std::string cdbStreamName = distillerData.getQuant(1).getResultsStreamName();
std::string cacheStreamName = distillerData.getQuant(1).getCacheStreamName();

These files can then be used to create a ms_ms1quantitation. The quantitation method required for the constructor is available from the Distiller project parameters.

matrix_science::ms_quant_method quantMethod = distillerData.quantMethod;
matrix_science::ms_ms1quantitation quant(peptideSummary, quantMethod);
quant.loadCdbFile(zipExtractPath + cdbStreamName, zipExtractPath + cacheStreamName);

Copyright © 2022 Matrix Science Ltd.  All Rights Reserved. Generated on Thu Mar 31 2022 01:12:30