Mascot: The trusted reference standard for protein identification by mass spectrometry for 25 years

Posted by Patrick Emery (October 25, 2021)

Choosing hardware for Mascot Distiller

One question which comes up frequently with regards to Mascot Distiller is what is a good specification for the workstation it’s going to be installed on. By that, we’re really interested in what sort of CPU you should be getting, how much RAM the workstation should have, and what sort of disk drives (SSD or HDD) should you be getting.

As ever in life, unless money is no object, the exact hardware you get for Mascot Distiller will be a trade off between specification and price. In this blog, we’re going to investigate some aspects of the hardware you should consider when ordering a PC to run Mascot Distiller on.

CPU

The main factors affecting Mascot Distiller performance are processor clock speed and the number of cores. Like Mascot Server, many processing steps in Mascot Distiller are multithreaded, allowing complex tasks to be speeded up by parallelising the work. Unlike Mascot Server, Mascot Distiller is not licensed by the core and can use all the CPU threads available to it to parallelise these tasks.

The Geekbench CPU benchmark is a fairly good guide to the performance you can expect from Mascot Distiller. Because Mascot Distiller can use all the cores made available to it, it’s important to look at the multi-threaded performance of the CPU, not just the single-thread performance.

To examine the effect of increasing the number of threads available to Mascot Distiller on peak picking, quantitation and de novo sequencing we ran a number of tests on a 12 core Intel® XEON® X5650 system, running the tests with 1, 3, 6 and 12 threads available. Results are presented in table 1 below:

Number of threadsPeak picking (mm:ss)Quantitation (12 files) (hh:mm:ss)De novo (mm:ss)
108:0802:05:4031:58
302:4400:44:1209:00
601:2900:25:4005:09
1201:1000:20:4103:51
Table 1:  Timings for completing various multithreaded tasks on a 12 core Xeon X5650 system.

As you can see, for all tasks we’re seeing improvements in the time taken as we increase the number of threads, but the speed improvements are diminishing the more threads we add; particularly when go from 6 to 12 cores – a trend which is particularly pronounced for peak picking. The reason for this is that the number of CPU cores is not the only factor which will determine overall processing time. As we add more threads, the processes can hit other limiting factors such as code synchronisation points, file and memory access. Exactly when the increase in performance by adding cores is going to tail off significantly will going depend your CPU type, workload and types of data – for example, if you have a dataset with more raw sample files, having more CPU cores available will improve performance – but for many people a modern 8-core CPU will give good overall performance compared to the hardware cost.

If I run the benchmarking tests on my AMD Ryzen 7-pro 3700u laptop, which has a better single threaded performance than the Xeon X5650 but lower multicore performance, the results reflect this. Using a single thread the Ryzen system was between 30 to 80% faster than the Xeon depending on the test. However, using all the available threads on both processors this is reversed, with the Xeon with its better multicore performance now being between 60 to 300% faster depending on the test.

We will continue to look at improving multi-threaded performance in future versions of Distiller.

Hyper-threading® and Clustered Multithreading

Hyper-threading on Intel CPUs and Clustered Multithreading on AMD CPUs are methods which allow a processor core equipped with them to pretend to be two “logical” processors to the host operating system. So, the 12 Core X5650 used in the thread benchmarking tests above will appear to have 24 logical cores.

Hyper-threading and clustered multithreading are not equivalent to a true multi-core processor and can typically give a 10-20% performance increase. Table 2 below summarises benchmark results for Mascot Distiller with hyper-threading enabled and disabled.

Number of threadsPeak picking (mm:ss)Quantitation (12 files) (hh:mm:ss)De novo (mm:ss)
1201:1000:20:4103:51
24 (hyper-threading enabled)00:4900:19:5803:18
% improvement303.514
Table 2: Comparison in our benchmarking times with (24 threads) and without (12 threads) hyper-threading enabled.

As you can see, hyper-threading gives a noticeable improvement in performance for both peak picking and de novo searching. It also improves quantitation performance, but the increase is much more modest. So, while it’s not essential, we would generally suggest keeping it enabled.

Processor Groups

64-bit Windows can only access a maximum of 64 threads. A few years ago that would have been sufficient to cover nearly all hardware use-cases, but core counts of CPUs have been increasing as the manufacturers have been hitting limits on single threaded performance. The highest core count for a ‘consumer’ CPU is now 64 cores with the AMD Threadripper 3990X. With ‘Clustered Multithreading’ enabled, that processor has 128 hardware threads available – or twice the maximum number of threads available to Windows.

To work around this issue, Microsoft introduced Processor Groups. If you have more than 64 logical processors (threads) available, Windows will automatically split the logical processors into two or more evenly balanced processor groups. Software needs to be specifically written to use multiple processor groups and most Windows software – including the current release of Mascot Distiller – is therefore only run on one processor group.

This has a number of practical considerations if you have access to a system with a very high core count. For example, if you have a 32-core processor with hyperthreading, you will have a single processor group of 64 logical processors. If you then upgraded your PC to a 48-core processor with hyperthreading, you now have 96 logical processors and Windows will automatically define two processor groups, each with 48 logical processors which will be a mix of 24 ‘real’ and 24 ‘hyperthreading’ cores. The result is that Mascot Distiller would actually run more slowly after upgrading the processor, which is clearly not a desirable outcome! Under those circumstances, we would recommend that you disabled hyperthreading in the BIOS – that way you would have a single Processor Group with 48 ‘real’ cores, which should give you improved performance.

Multiple Processor Group support is a feature we will add in a future release of Mascot Distiller, but in the mean time it is something you should consider if you have access to a system with more than 64 logical processors.

Mascot Server already supports multiple processor groups on Windows.

RAM

As a rule of thumb, we’d recommend having at least 16Gb of RAM. Aim for 32Gb or more if you’re processing very large datasets. Particularly if you’re processing large label-free datasets using the ‘Replicate’ method.

During the benchmarking tests above, the maximum amount of memory Distiller used was approximately 9Gb during the quantitation tests and you always want memory ‘in reserve’ to support both the operating system requirements (plus any other software running), and to avoid the system having to use swap space (where chunks of memory are swapped in and out of physical RAM onto the hard drive), which would significantly slow down processing.

Disk Drives

In Mascot Distiller, we’ve found that accessing the raw data files on a local drive if possible (including a fast local USB connected drive), rather than network storage, is the more reliable option, so you’ll need a reasonable amount of storage to accommodate any raw data you’re currently working on. Once you’ve completed a project, moving both the save project file and raw data file off to network storage should be fine, so long as you have a fast network connection.

SSDs are still slightly more expensive that traditional HDDs, but becoming increasingly common and prices have come down a lot even for large storage. SSDs offer improved read and write performance, which will give improved performance with Mascot Distiller. In the benchmarked peak picking, quantitation and de novo searching tests we typically see a 5% improvement in performance using SSDs over HDDs.

Keywords: ,

Leave a Reply

Your email address will not be published. Required fields are marked *

*

HTML tags are not allowed.