kuniga.me > NP-Incompleteness > Haskell Concurrent Programming

14 May 2015

### Introduction

In this post we’ll talk about concurrency and parallelism in Haskell. This post is a set of notes from Chapter 24 of Real World Haskell and Chapter 2 and Chapter 3 from Parallel and Concurrent Programming in Haskell.

Concurrency and parallelism are similar concepts, and it’s hard to draw a line between their definitions. In general, concurrency represents the ability of an application handling multiple (concurrent) tasks at the same time. It doesn’t require the underlying hardware to have multiple cores, so even applications running on single core can implement concurrency via threads.

Parallelism, in other hand, usually means breaking a slow task in smaller pieces that can be computed in parallel in order to speed the elapsed time. To be worth it, it requires multiple-cores or multiple machines.

In an operating system context, a process represents an instance of a program. It has a self-contained environment (e.g. memory space) and contains one or more threads. Threads can be thought as lightweight processes that share the environment with other threads in the same process [4].

Forking. The module Control.Concurrent implements a thread system. The program starts out with a single thread (referred as main thread) and we create new threads from the current thread using a fork.

We can use the forkIO function, which takes an IO action and runs that in a separate thread. The remaining code continues running on the main thread. Because of the concurrent nature of threads, the system doesn’t guarantee order of execution between those. In the example below, we fork the main thread to print “hello” and the the main thread prints “world”.

If we run this code many times we’ll get varying results from "hello""world" to "wor"lhde"llo" (printing the whole sentence is not atomic because Haskell uses lazy evaluation - we’ll see next how to avoid this behavior).

Mutable variable or MVar is a structure provided by the Control.Concurrent module. It can either have a value or be empty. If we try to write (putMVar) to a MVar that already has a value, the writing thread blocks until some other thread extracts (takeMVar) the current value from that MVar. On the other hand, if a thread tries to read from an empty MVar, it blocks until another thread puts a value into it.

We can use MVars to synchronize threads. In the fork example, there were no guarantees in the order of execution, but in the following example, we can guarantee the child thread will only execute putStrLn after putMVar is executed.

The MVar provides a mutex (mutual exclusion) for a piece of data (the "wake up" string), since only one thread can read the value from the MVar. They also provide a communication mechanism between two threads.

Channel or Chan, can be viewed as a generalization of a MVar. In a channel, a thread doesn’t block when inserting a value. It rather enqueues that value. When a thread reads from the channel, it pops an element from the front of the queue, or blocks if the queue is empty.

We can think of Chan as containing a queue of infinite size, whereas MVar has a queue of size 1.

In the following example, we have two children threads writing to a channel and the main thread reading from them. We guarantee the program will only finish after but “hello world” and “now i quit” are printed, but the order is not guaranteed.

MVar as a lock. If we run the code above, chances are the output will come out mingled, like

We can make the print function atomic by using an MVar and require the threads to “acquire” it in order to invoke the print function. The following example creates an atomic version of putStrLn using the MVar lock:

The function withMVar() takes an MVar and a function. It extracts the value of the MVar (lock in the example above), executes the function with that value and puts the value back into the MVar. In this case, since we’re using it just for locking, we don’t care about the value inside the MVar.

We can replace putStrLn with atomicPutStrLn in our previous code:

Note how we create the MVar lock with a dummy value and use it for the atomicPutStrLn calls [5].

### Parallel Programming

Setup. By default Haskell only uses a single core, even if more are available. To turn on multi-threaded support, we must use the -threaded flag during compilation.

Then, in runtime, we can provide the -Nx flag to the RTS (Run Time System) when running the program (where x is a natural number representing the number of cores).

One challenge in parallelizing Haskell code is due to lazy evaluation. We have less control on when part of the code will be actually evaluated so we need to impose some strictness to guarantee it will be executed in parallel.

Weak head normal form vs. normal form

This answer on StackOverflow gives a very nice explanation between Weak Head Normal Form (WHNF) and Normal Form (NF). Copying parts of it here.

An expression in normal form is fully evaluated, and no sub-expression could be evaluated any further (i.e. it contains no un-evaluated thunks).

These expressions are all in normal form:

 42 (2, “hello”) \x -> (x + 1)

These expressions are not in normal form:

 1 + 2 – we could evaluate this to 3 (\x -> x + 1) 2 – we could apply the function “he” ++ “llo” – we could apply the (++) (1 + 1, 2 + 2) – we could evaluate 1 + 1 and 2 + 2

An expression in weak head normal form has been evaluated to the outermost data constructor or lambda abstraction (the head). Sub-expressions may or may not have been evaluated.

 (1 + 1, 2 + 2) – the outermost part is the data constructor (,) \x -> 2 + 2 – the outermost part is a lambda abstraction ‘h’ : (“e” ++ “llo”) – the outermost part is the data constructor (:)

These expressions are not in weak head normal form:

 1 + 2 – the outermost part here is an application – of (+) (\x -> x + 1) 2 – the outermost part is an application of – (\x -> x + 1) “he” ++ “llo” – the outermost part is an application of (++)

We can analyze whether a given expression has been evaluated in ghci using the :sprint command. It prints the contents of a variable if it was already evaluated of “_” otherwise. For example:

This is useful to understand how things get evaluated. Another interesting example is working with lists.

Your results may vary, since Haskell might decide to perform more than the bare minimum evaluation.

The seq function takes two arguments (seq x y) and before y is evaluated to the WHFN, x is also evaluated to WHFN. The par function (from the Control.Parallel module) is similar to seq but it also tries to evaluate the first argument in parallel.

The compiler might decide to evaluate the second argument of seq before the first if it thinks it would improve performance. pseq is a stricter version in which the first argument is always evaluated first.

WHFN might not be enough for parallelizing. To make sure we’re splitting an expensive task among cores, we need to force full evaluation (normal form). To see why, consider the following example, in which we want to parallelize the map function:

If b has a nested structure, it’s not guaranteed that calling r par will fully evaluate r, since it only guarantees to evaluate the outermost constructor. To overcome that, one option is to use evaluation strategies.

Evaluation Strategies

Before talking about strategies, let’s introduce the Eval monad, provided by Control.Parallel.Strategies:

The rpar and rseq are the counterpart of par and pseq respectively. rpar indicates its argument can be evaluated in parallel (non-blocking), while rseq forces the evaluation of its arguments before continuing (blocking).

Using this monad, our parallelMap function would become:

Strategy is a design pattern in which we make the code independent of algorithms, so different algorithms can be used interchangeably. The Control.Parallel module uses it to separate the evaluation strategy from the application code.

In this context, a strategy is basically a function that takes in a type and defines a way to evaluate that type. More specifically,

A simple example is defining a strategy for a pair. We can evaluate each element in parallel:

Note that a strategy “wraps” expression in the Eval monad. We can use runEval to “extract” that and evaluate the code. For the example with pairs, we could do:

The using function expects a value and a strategy as parameter, apply the strategy over the value and then evaluates it using runEval.

This syntax is easier to read and is more explicit about the separation of the evaluation strategy and the actual code. The pair example with using would be:

The problem with rpar and rseq is that they only force evaluation to Weak Head Normal Form. Thus, if the elements within the pair are nested in another constructor, they might not be fully evaluated. To solve that, we can use different evaluation strategies other than rpar.

In order to do that, we can step up one level of abstraction and generalize parPair by defining a function that defines which strategies to evaluate to each pair:

We can redefine now parPair in terms of evalPair

We can then use the rdeepseq which expect types implementing the NFData (NF is for normal form) interface. This strategy evaluates a structure to normal form, by traversing it recursively, by calling force.

### Conclusion

In this post we covered concurrency and parallelism in Haskell. We learned how to work with multi-threads using the Control.Concurrent module, which also provides mechanism for mutual exclusion and communication between threads.

In the second part of the post, we saw how to make use of multiple cores to speed up expensive parts of the code by dividing the task in smaller pieces, that can be then executed in parallel. We learned that one of the main difficulties in doing parallel work in Haskell is due to lazy evaluation. We’ve covered ways to address that problem using evaluation strategies, which are designed in such a way that they are decoupled from the actual code being parallelized.

I’ve heard about Parallel and Concurrent Programming in Haskell, written by Simon Marlow before, but didn’t have a chance to check it out. Having read the first 3 chapters so far, I think it’s really well written and easy to follow. I’m excited to read more.

### References

• [1] Real World Haskell, Chapter 24
• [2] Parallel and Concurrent Programming in Haskell, Chapter 2, 3
• [3] Design Patterns: Elements of Reusable Object-Oriented Software - Strategy
• [4] Oracle - Processes and Threads
• [5] StackOverflow - Can I ensure that Haskell performs atomic IO?