Avoiding Multiprocessing Errors in Bash Shell

 

Suppose you have two Linux processes trying to modify a file at the same time and you don’t want them stepping on each other’s work and making a mess.  A common solution is to use a “lock” mechanism (a.k.a. “mutex”). One process “locks the lock” and by this action has sole ownership of a resource in order to make updates, until it unlocks the lock to allow other processes access.

Writing a custom lock in Linux bash shell is tricky. Here’s an example that DOESN’T work right:

#/bin/bash
let is_locked=1 # helper variable to denote locked state
mylockvariable=$(cat mylockfile 2>/dev/null)  # read the lock
while [ "$mylockvariable" != $is_locked ]  # loop until unlocked
do
    sleep 5 # wait 5 seconds to try again 
    mylockvariable=$(cat mylockfile 2>/dev/null)  # read again
done
echo $is_locked > mylockfile  # lock the lock
# >>> do critical work safely here <<<
# >>> ERROR: NOT SAFE <<<
rm mylockfile  # unlock the lock

Here the lock value is stored in a shared resource, the file “mylockfile”. If the file exists and contains the character “1”, the lock is considered locked; otherwise, it is considered unlocked.  The code will loop until the lock is unlocked, then acquire the lock, do the required single-process work, and then release the lock.

However, this code can fail without warning: suppose two processes A and B execute this code concurrently. Initially the lock is in an unlocked state. Process A reads the lockfile. Then suppose immediately after this, Process A is temporarily interrupted, perhaps to give CPU cycles to run Process B. Then, suppose Process B begins, reads the lock, locks the lock and starts doing its critical work. Suppose now Process B is put into wait state and Process A is restarted. Process A, since it previously read the lockfile, wrongly believes the lock is unlocked, thus proceeds to also lock the lock and do the critical work—resulting in a mess.

This is an example of a classic race condition, in which the order of execution of threads or processes can affect the final outcome of execution.

A solution to this conundrum is found in the excellent book, Unix Power Tools [1,2]. This is a hefty tome but very accessibly written, for some people well worth a read-through to pick up a slew of time-saving tips.

The problem with the example code is the need to both read and set the lock in a single, indivisible (atomic) operation. Here’s a trick to do it:

#/bin/bash
until (umask 222; echo > mylockfile) 2>/dev/null  # check and lock
do  # keep trying if failed
    sleep 5 # wait 5 seconds to try again 
done
# >>> do critical work safely here <<<
rm -f mylockfile  # unlock the lock

Here, the existence of the lockfile itself is the indicator that the lock is set. Setting the umask makes this file creation fail if the file already exists, triggering the loop to activate to keep trying. This works because the existence of a file can either be true or false and nothing else; the existence of a file is guaranteed atomicity by the OS and the filesystem. Thus, assuming the system is working correctly, this code is guaranteed to produce the desired behavior.

Race conditions can be a nuisance to find since their occurrence is nondeterministic and can be rare but devastating. Writing correct code for multiple threads of execution can be confusing to those who haven’t done it before. But with experience it becomes easier to reason about correctness and spot such errors.

References:

[1] Peek, Jerry D., Shelley Powers, Tim O’Reilly and Mike Loukides. “Unix Power Tools, Third Edition.” (2002), https://learning.oreilly.com/library/view/unix-power-tools/0596003307/

[2] https://learning.oreilly.com/library/view/unix-power-tools/0596003307/ch36.html#:-:text=Shell%20Lockfile

7 thoughts on “Avoiding Multiprocessing Errors in Bash Shell

  1. Fascinating, important, and clearly explained. Thank you.

    Would implementing mutex in Rust (with whatever tools it may offer) compile down to kernel versus user land calls and be faster?

  2. The bash code shown is not particularly fast. I think writing in C (or other compiled language like C++ or Rust) would be faster. Whatever code used will be somewhat slowed down by filesystem access, though this could to some extent be mitigated by RAM caching of files, supported by the OS. The process launch overhead in the bash code is rather heavy. In a compiled code, instead using threads (or if possible, persistent threads that can be reused and not re-created every time) would be faster, if this can be made to support the desired use case. If one just wants to atomically update a memory location, if the processor supports it, a hardware atomic can be really fast (on NVIDIA GPUs, Pascal and later, atomics became really really fast using CUDA intrinsics). I don’t know the exact answer to your question on user vs. kernel space, but for user threads to atomically update memory, thread libraries are “supposed to be” really fast — but they aren’t always, depending on how good the implementation is, empirically testing the speed of operations for different thread libraries can sometimes be surprising. For speeding up the bash code here, I don’t know the relevant overheads of user vs. system space etc.

  3. Many years ago when my world had NFS filesystems as the norm, the most portable way, which was completely reliable, was to use mkdir. It will succeed for only one invocation, others will fail.

  4. Thanks for references to flock. The bash code in the post is useful for its portability—I see MacOS doesn’t have flock, though maybe someone has done a port. Also, the ideas in the bash version might be useful for building a custom lock if needed. Otherwise flock looks like a great solution.

Leave a Reply

Your email address will not be published. Required fields are marked *