Blog
Articles tagged: FDR
Does your search engine show the evidence?
You’ve submitted a protein sequence database search and start looking at the results. Why did the search engine identify that protein? What is the peptide evidence? Which alternatives did the software consider? Is the software’s decision correct? These are basic yet important questions with any software-driven approach – which is the bulk of today’s MS/MS data analysis. A lot of [...]
Identify more HLA peptides
Endogenous peptides are challenging to identify by database searching. A Mascot no-enzyme search matches every subsequence of a protein to the observed spectrum, which makes a very large search space even if precursor tolerance is tight. As a result, Mascot score thresholds tend to be conservative and sensitivity is reduced. Mascot ships with Percolator, which often improves discrimination between true [...]
Error tolerant searches now show statistical significance
The latest release of Mascot Server introduces some important changes to error tolerant searches. Matches from the second pass search now have expect values attached, indicating confidence levels. These are either estimates based on counting trials or empirical values derived from searching a decoy database. If you are not familiar with the error tolerant search, now is the time to [...]
Validating intact crosslinked peptide matches
Intact crosslinked search results are more complex than conventional (non-crosslinked) searches, because there are many more degrees of freedom. The precursor mass could be within tolerance of a looplinked sequence, a linear sequence with monolink and several different alpha-beta candidates. Each possibility is multiplied if you also consider variable modifications like oxidation of methionine. Mascot 2.7 uses the same scoring [...]
Human Proteome Project data interpretation guidelines
The Human Proteome Project (HPP) data interpretation guidelines were recently updated. Many of the guidelines are good practice and common sense in any proteomics study where reliable protein identification is critical, not just when studying the human proteome. The guidelines are easy to meet using Mascot Server 2.7. Core guidelines The full list consists of 9 guidelines. The first one [...]
Protein FDR in Mascot Server 2.7
One of the new features in Mascot Server 2.7, now running on this web site, is an estimate of protein FDR. This is displayed in the Protein Family Summary for Fasta searches whenever automatic decoy is selected. The basis is the number of proteins inferred in the target database compared with the number in the decoy database. Conceptually, this is [...]
Common myths about protein scores
Mascot Server is used in many different application areas by both mass spectrometry experts and non-experts. Over the years, we’ve spotted a few recurring misconceptions about how protein scores are interpreted and used. All the examples come from recent peer-reviewed papers. Protein scores in PMF searches The very first thing to check is, what type of experiment is being reported. [...]
What are you inferring?
Benchmarking protein inference is notoriously difficult. Artificial samples of known content tend to be too simple while real samples lack ground truth. An interesting approach was adopted for the ABRF iPRG 2016 study, and has been the subject of a publication from The et al. A collection of human Protein Epitope Signature Tags (PrESTs) were expressed in E. coli and [...]
Back to basics 5: Peptide-spectrum match statistics
Mascot can identify peptides in uninterpreted MS/MS data. Observed spectra are submitted to Mascot as search queries. A query specifies the precursor ion m/z and charge state as well as the MS/MS peak list. Mascot digests protein sequences from the chosen database and selects peptide sequences whose mass is within the specified tolerance of the query’s precursor mass. The software [...]
High FDRs for methylated peptides III
The MCP paper "Large Scale Mass Spectrometry-based Identifications of Enzyme-mediated Protein Methylation Are Subject to High False Discovery Rates" raises some important questions concerning the accuracy and interpretation of database search results. In this third article, we look at the difference between using counts of matches (PSMs) and counts of distinct sequences to calculate the false discovery rate (FDR). filestarget [...]