Posted by Richard Jacob (December 14, 2014)

Running Mascot Server on a supercomputer

A supercomputer comprises a large number of dedicated processors, typically hundreds to thousands, that are situated close together in racks and cabinets and connected by a fast network. An individual computational node might look like a normal server or may be a more specialized blade server or a bare bones system of the sort Google and Facebook sometimes use. Their performance is measured in the hundreds of trillion Floating Point Operations per second range (teraFLOPS or TFLOPS) compared to an individual desktop PC with around 100 gigaFLOPS. Supercomputers are used for computationally intensive tasks such as modelling, prediction and physical simulations. Potentially, they can be used to run any multi-threaded calculation including Mascot searches.

Many universities make supercomputers available to their researchers that they can use for large calculation jobs and then pay for depending on the length of the job and number of processors used. The jobs are controlled by a software queuing system that balances the workload across all the nodes. Using a supercomputer can reduce the time take to run a calculation from many days to hours or minutes. Supercomputers are often promoted within a university as a cheaper solution than purchasing dedicated hardware. After all, the university already owns the supercomputer and wants to maximize its use and improve the cost/benefit ratio.

Although you can run Mascot Server on a supercomputer, is it a good idea? There are three main point to consider:

1. Performance

Supercomputers have super performance, right? Well, mainly due to the number of nodes. An individual node will most likely be older and slower than a new PC. Mascot Server is licensed by the number of processors, where each processor is good for 4 cores. Given the cost of the license compared to the hardware, it is not cost effective to use low performance CPU’s. The single most important performance consideration for Mascot Server is the CPU performance per core, which closely tracks the PassMark CPU benchmark. If the nodes of a supercomputer are two or three years old, the performance of a new PC with fast processors may be 50% to 100% better.

2. Job control and queuing

When a job is started, the job management system will assign the nodes to be used and activate any required software or libraries. Mascot Server does not use or support any of the common job management systems like Sun Grid Engine (SGE), SLURM, or Moab. This means that the supercomputer’s job control software has to work around the nodes used by Mascot Server. Additionally, Mascot Server does not have any software to queue jobs on the server side because it would be unlikely to improve performance or usability of the server. For a discussion on this see the earlier post Don’t get stuck in a queue

3. Dynamic assignment of search nodes

Before a sequence database can be searched, it has to be read from disk into memory on each search node. If search nodes are allocated statically, and they have adequate memory, this is a one time operation. If search nodes are assigned dynamically, it happens every time a new node joins the cluster, which can be a significant overhead. More importantly, once a search is running, you cannot change the cluster configuration until the search is complete. Since searches run in parallel on the same set of search nodes, nodes can only be dropped or added if the server is allowed to become completely idle.

Given these points, it is likely to be more cost effective and a lot less work to run modest clusters on dedicated, commodity PCs. Using shared supercomputer hardware only becomes attractive for very large clusters, where the cost per processor of the licence drops below that of the hardware. And, it is important that the supercomputer administrator understands the need for nodes to be allocated statically, possibly exclusively.

Leave a Reply

Your email address will not be published. Required fields are marked *

*

HTML tags are not allowed.