Matrix Science header

Multithreading (advanced reading)
[Getting started with Mascot Parser]

Using Parser in multithreaded applications

If your application is singlethreaded, you can safely skip this section.

Restrictions on threads

Parser classes and methods are not guaranteed to be reentrant or thread-safe. However, with some care, it is possible to use Parser in a multithreaded application.

The easiest and safest solution is to only use Parser from a single thread in your application. For example, if your application has four threads, only use Parser classes and methods in thread 3 and never from threads 1, 2 or 4. If you can guarantee that no thread other than thread 3 uses Parser or has access to thread 3's variables or memory space, then Parser methods will work just fine. Note that if thread 3 should transfer data to or from the other threads, the data cannot be encapsulated in Parser objects; that is, don't pass ms_protein or ms_peptide objects between threads, or any other Parser objects.

If you need to use Parser from multiple threads, you have two choices: either create a synchronised data abstraction class (shared global data) or use thread-local storage for completely independent instances of Parser objects in each thread.

In the synchronised class scenario, you need to write a data abstraction class that uses Parser internally (storing copies of the ms_mascotresfile etc. objects), whose methods are guarded by a single mutual exclusion lock (mutex). The mutex must guard access to the entire data abstraction object, and each method must start by acquiring the mutex. If thread 3, for example, makes a method call and acquires the mutex, then any other thread making the same or different method call on the object will block until thread 3's method call has finished and released the mutex. This strict mutual exclusion prevents race conditions and ensures the Parser data shared between threads is only ever accessed by one thread at a time.

However, there are two caveats: 1) the data abstraction class must not return Parser objects or otherwise expose them, as this would nullify the protection of the mutex; and 2) the data abstraction class must be instantiated in a global memory space shared between all threads, so that it is not specific to a thread. The data abstraction class need not implement the Singleton pattern; you could have multiple instances of the class, each with its own internal, independent copies of Parser classes. If you do make it a Singleton, ensure you follow a thread-safe Singleton pattern.

In the other scenario, each thread could have its own copies of Parser classes that are independent of each other. For example, if thread 1 creates an ms_mascotresfile object resfile1, and thread 2 creates resfile2, then thread 1 can call methods of resfile1 concurrently with thread 2 calling methods of resfile2. This is in effect the same situation as the singlethreaded mode, with the same restrictions: thread 2 must not call methods of resfile1, or vice versa, and the threads must not pass Parser objects between themselves. If you do, the application is almost certain to crash sooner or later.

Further restrictions in C++

The above design restrictions apply to all programming languages, including C++. However, the C++ version of Parser has an additional restriction related to Apache Xerces; please refer to section Using Apache Xerces and Parser in the same application.

Multithreading does not change these issues in the dynamically linked case on either platform, but you must take extra care if you are linking statically against Parser and Xerces. Section Xerces and statically linked Parser shows an example sequence of Xerces and Parser calls that is safe in a singlethreaded application. The same sequence will only be safe in a multithreaded application if you can guarantee that no two threads call Parser and Xerces functions concurrently. Otherwise you are almost certain to corrupt the internal Xerces state.

The only easy fix is to do all Parser and Xerces processing in a single thread strictly following each other (not interleaved in any way). It may be best to use a different XML library in statically linked multithreaded applications to avoid the problem entirely.


Copyright © 2022 Matrix Science Ltd.  All Rights Reserved. Generated on Thu Mar 31 2022 01:12:30