kuniga.me > NP-Incompleteness > Local Inter-Process Communication

Local Inter-Process Communication

07 Dec 2024

Image of two people talking

In this post we explore different ways to communicate between two processes when they’re running on the same host.

This is largely based on Beej’s Guide to Interprocess Communication [1]. The key differences: in this post the examples use C++ as opposed to C and for message queues and shared memory, we use the POSIX versions as opposed to System V.

Further, we omit the concurrency mechanisms from Beej’s guide, namely file locks and semaphores. And finally, this post includes a benchmark and a comparison table between the different mechanisms.

Index

Pipe

A pipe is a one-way communication channel between a source process and a destination process, using file descriptors.

A downside is that processes don’t share file descriptors by default. One exception is when a process creates a child process via fork(), in which case the child inherits the file descriptors from the parent, so this is one scenario where pipes are useful. In newer Linux systems it’s possible to copy the file descriptors from one process into another via pidfd_getfd() (kernel version +5.6).

Here’s an example inspired by Beej’s Guide [1]:

int main() {
    int fds[2];
    // Creates 2 file descriptors, one for reading,
    // one for writing, respectively. Assign them to
    // fds.
    pipe(fds);

    if (!fork()) { // child, inherits file descriptors
        std::string s = "test";
        // Writes the '\0' too.
        write(fds[1], s.data(), s.length() + 1);
    } else { // parent
        char buf[512];
        int size = 0;
        while (true) {
            size += read(fds[0], buf + size, sizeof(buf));
            if (buf[size - 1] == '\0') {
                break;
            }
        }
        std::cout << "read: " << buf << std::endl;
        // blocks until children exit
        wait(NULL);
    }

    return 0;
}

Despite the term “file descriptor”, the transport for pipes is actually in memory (kernel space). The function read() is not guaranteed to return all that the child process wrote in a single call (the kernel memory space is limited, so it has to stream the data), so we have to loop. It’s thus a bit tricky to determine when to stop. One option is to look for a string terminator \0 as we do in the example above.

FIFO

A FIFO is also known as a named pipe. The reason is that it can be identified by an alias (a name) between two unrelated processes.

As our example, we have two binaries, one for the writer side, another for the reader side. They both obtain a named pipe, which one process writes to, and the other reads from. The writer gets input for stdin and writes to the named pipe via the file descriptor:

// writer.cpp

#define FIFO_NAME "channel"

mknod(FIFO_NAME, S_IFIFO | 0666, 0);

// Opens the pipe in write-only mode.
// This API blocks until another process
// opens the pipe in read mode
int fd = open(FIFO_NAME, O_WRONLY);

// Pipes strings from stdin until EOF (Cltr-C)
std::string buf;
while (std::cin >> buf) {
    write(fd, buf.c_str(), buf.length());
}

The reader consumes data from the pipe and prints to stdout:

// reader.cpp

// has to match the one from the other process
#define FIFO_NAME "channel"

mknod(FIFO_NAME, S_IFIFO | 0666, 0);

// Opens the pipe in read-only mode.
// This API blocks until another process
// opens the pipe in write mode.
int fd = open(FIFO_NAME, O_RDONLY);

int size;
char buf[300];
do {
    size = read(fd, buf, 300);
    buf[size] = '\0';
    std::cout << "read: " << buf << std::endl;
} while (size > 0);

The main advantage of named pipes over pipes is that the processes can be independent, they don’t need to share file descriptors.

Like pipes, a named pipe is uni-directional: if we want a 2-way communication channel, we can always use two named pipes. Also like pipes, FIFO transform is via byte-streams, which means that if a sender writes a message, it might be read in chunks by read(), so we need to read it in a loop.

Message Queue

There are two main message queues available in Unix OSes: System V and POSIX message queues. The former is included in sys/msg.h and the latter in mqueue.h.

The POSIX queues are considered more modern than the System V and have more features. Besides API the major difference is between them is that System V queues preserve data on system reboots, whike POSIX queues are thread-safe and have asynchronous notification mechanisms. For these reasons, we’ll focus on POSIX message queues. Note however that MacOS does not support them, so our experimentation were done in a Linux system.

We’ll use a writer-reader example like in the previous section. A simple writer code could look like:

#include <mqueue.h>

#define QUEUE_NAME "/my_queue"

int main () {
    auto mq = mq_open(QUEUE_NAME, O_WRONLY | O_CREAT, 0644, NULL);
    std::string buffer = "hello world";
    mq_send(mq, buffer.c_str(), buffer.length() + 1, /*priority=*/ 0);

    mq_unlink(QUEUE_NAME);
}

The function mq_open() creates a message queue if one doesn’t exist, with an associated id QUEUE_NAME. Note that mq_unlink() is needed when the program terminates because otherwise the queue stays active (since it’s decoupled from the process lifetime).

On the reader side:

#include <mqueue.h>

#define QUEUE_NAME "/my_queue"
#define MAX_SIZE 1024

int main () {
    auto mq = mq_open(QUEUE_NAME, O_RDONLY | O_CREAT, 0644, NULL);

    char buffer[MAX_SIZE];
    int bytesRead = mq_receive(mq, buffer, MAX_SIZE, NULL);
    buffer[bytesRead] = '\0';

    std::cout << buffer << std::endl;
    mq_unlink(QUEUE_NAME);
}

Which uses the counterpart API of the write side. Note that in mq_open() we create the queue in O_RDONLY mode. Also in mq_receive() we have to specify a large enough buffer size to read a message of unknown length, like we did in the FIFO case.

By default mq_receive() is not a blocking operation. It’s possible to configure that via:

struct mq_attr attr;
attr.mq_flags = 0; // Blocking read
attr.mq_maxmsg = 10;
attr.mq_msgsize = MAX_SIZE;
attr.mq_curmsgs = 0;

auto mq = mq_open(QUEUE_NAME, O_RDONLY | O_CREAT, 0644, &attr);

It’s worth noting that attr is only applied when a queue is actually created. If a queue already exists, those attributes are ignored. That’s why it’s better to have mq_unlink() once the program is terminated.

Differently from pipes (named or not), message queues are message-oriented, which means messages are read on its entirety via mq_receive(), we don’t need to keep reading it in a loop like we did in prior examples.

Like pipes, message queue data is also kept in kernel space memory.

Priority queues. It’s possible to use POSIX message queues as priority queues by specifying the priority argument on mq_send(). Messages with higher priority value are placed ahead of those of lower priority, irrespective of insertion order.

On the reader side if it doesn’t specify the priority, it will read the messages sorted in priority order. If it does, then it acts as a filter: it will only read the messages of the specified priority.

Shared Memory Segments

Similar to message queues, there are two main options available in Unix OSes: System V and POSIX shared memory segments. The former is included in sys/shm.h and the latter in fcntl.h (standing for file control).

As we did with message queues, we’ll focus on the POSIX shared memory segments. As usual, our example will consist of a sender and a receiver, but this time they communicate via a shared memory segment.

Shared memory segments do not support any form of synchronization mechanisms. It’s up to the application to guarantee that multile processes don’t run into race conditions. We can use either POSIX mutexes or semaphores for this. For simple access control, mutexes are simpler and cheaper to us than semaphores, but for the POSIX, the semaphores API is simpler than the mutexes one, so we’ll go with that.

#define SHM_NAME "/my_shared_mem"
#define SEM_NAME "/my_semaphore"
#define SHM_SIZE 1024

int main() {
    // Create or open the shared memory object
    int shm_fd = shm_open(SHM_NAME, O_CREAT | O_RDWR, 0666);

    // Set size of shared memory
    ftruncate(shm_fd, SHM_SIZE);

    // Map shared memory
    void *shm_ptr = mmap(
        0,
        SHM_SIZE,
        PROT_READ | PROT_WRITE,
        MAP_SHARED,
        shm_fd,
        0
    );

    // Create or open a semaphore with max 1 concurrent access
    sem_t *sem = sem_open(SEM_NAME, O_CREAT, 0666, 1);

    // Synchronize access
    sem_wait(sem);  // Lock
    snprintf((char *)shm_ptr, SHM_SIZE, "Hello world!");
    sem_post(sem);  // Unlock

    // Block before cleaning up
    int x;
    std::cin >> x;

    // Clean up
    munmap(shm_ptr, SHM_SIZE);
    close(shm_fd);
    shm_unlink(SHM_NAME);
    sem_close(sem);
    sem_unlink(SEM_NAME);
}

The function shm_open() returns a file descriptor, and is analogous to open() API to open or create a file.

When a memory segment is created, it has size 0 bytes. We can use file operations like ftruncate() on it to define a given size, but I was wondering why we can’t pass the size of the memory segment directly in shm_open(). One reason is that if the segment already exists we might not want to overwrite the existing size. In a way this is consistent with open(), however we can append to files without explicitly resizing them.

Each process has its own memory address space, so by default when you write to a specific memory address, they actually map to a different physical address. In this case we want them to map to the same address! We do this via the mmap() function.

In the call:

void *shm_ptr = mmap(
    /*addr=*/ 0,
    SHM_SIZE,
    PROT_READ | PROT_WRITE,
    MAP_SHARED,
    shm_fd,
    /*offset=*/ 0
);

The first argument being 0 tells the kernel to find an appropriate address in the process space for us, which it then returns, and we store in shm_ptr. It will then map the first SHM_SIZE bytes of the physical address corresponding to shm_fd to shm_ptr.

The 3rd paramter just tells that the memory can be read or written to. The 4th parameter indicates the memory is to be shared by multiple processes. If we use MAP_PRIVATE even if passing a shared memory segment as shm_fd, the kernel will actuall copy that physical page when one of the process tries to write to it.

Finally, the last parameter is the offset from shm_fd from which to start the mapping.

The code for the semaphore is clear enough and it’s analogous to other POSIX APIs. The only part worth mentioning is that here we pass value=1:

sem_t *sem = sem_open(SEM_NAME, O_CREAT, 0666, /*value=*/ 1);

Which means only one process can call sem_wait(sem) without blocking, so it behaves like a lock.

The last observation to make is regarding:

snprintf((char *)shm_ptr, SHM_SIZE, "Hello world!");

Note how we pass the size of the shared memory segment as opposed to the length of the string. This is to protected against buffer overflow in case the string happens to be greater than the segment. Also note this writes to the begining of the segment. If we wish to write more things we need to use a proper offset (and an updated size as second parameter).

For this example, case we assume the sender will call snprintf() before the receiver. The receiver example is as follows:

#define SHM_NAME "/my_shared_mem"
#define SEM_NAME "/my_semaphore"
#define SHM_SIZE 1024

int main() {
    // Create or open the shared memory object
    int shm_fd = shm_open(SHM_NAME, O_CREAT | O_RDWR, 0666);

    // Set size of shared memory
    ftruncate(shm_fd, SHM_SIZE);

    // Map shared memory
    void *shm_ptr = mmap(
        0,
        SHM_SIZE,
        PROT_READ | PROT_WRITE,
        MAP_SHARED,
        shm_fd,
        0
    );

    // Create or open a semaphore with max 1 concurrent access
    sem_t *sem = sem_open(SEM_NAME, O_CREAT, 0666, 1);

    // Synchronize access
    sem_wait(sem);  // Lock
    printf("Read from shared memory: %s\n", (char *)shm_ptr);
    sem_post(sem);  // Unlock

    // Assume the other process will do the cleanup
}

One nice thing about shared memory segments is that there’s no copy involved. It can write directly to the shared physical memory. This in contrast to the other structures we covered, which rely on the kernel memory to communicate between processes, so there’s necessarily copies involved (a process doesn’t have access to the kernel nor other processes memory space).

In C++ we could potentially abstract this by using custom memory allocators, so when allocating data in the heap via new, we could internally do it in the shared memory segment. Though on the opposite side we’d still need to rely on some pointer arithmetic.

One caveat though is that if we want to avoid copies, we must make sure the binaries ABI compatible, in the sense that the layout of objects in memory must be the same or at least compatible between the processes.

For example, suppose the processes share data encapsulated in a class C, defined in a common library. Process 1 runs with an newer version of the library which added a new field to C. Suppose it writes an array of Cs to a shared memory segment from Process 2 to read. Process 2 is running the old version of the library and doesn’t know about the new field, so the layout of the array of Cs it uses is incorrect, and will lead to all sorts of weirdness.

Memory Mapped Files

A memory mapped file is a mechanism to use memory semantics for disk. With files you have to explicitly call fseek() before writing data to a specific position. A memory mapped file simulates random access where you can just write through pointers.

This would be pretty innefficient to as-is, especially with hard disk drives, because the APIs like fseek() reflect physical constraints, in particular sequential memory access. So the kernel keeps pages in memory and flushes to disk from time to time. It does change the durability semantics: for example, if the host crashes before data is flushed to disk, it is lost.

It turns out that “file” in “memory mapped file” is a bit misleading, because the underlying “storage” doesn’t need to be an actual file: it can be anything that has a corresponding file descriptor, including shared memory segments!

Since files are sharable by multiple processes, it can be used for IPC. We’ll not delve much into implemention for memory mapped files.

Unix Domain Sockets

We discussed about sockets in a previous post [2] (also based on Beej’s tutorials!). In that post we focused on network sockets, but sockets can be used for IPC.

You might point out that network sockets can also be used for IPC by specifying the IP address of the localhost and some port. The reason to use a dedicated domain is to avoid overhead, in particular that of TCP/IP layers. The kernel might take shortcuts to avoid some of the overhead but it cannot optimize it all the way. In a benchmark I found on the internet [4], Unix domain sockets had 2x higher throughput than network sockets.

One way to characterize Unix domain sockets is as a 2-way FIFO. To highlight this, we’ll use the example from Beej [1], of a client-server setup. The server binds to a specific file description and waits on a loop for client connections. Once a client connects, it waits for some data and it sends it back as response.

Here’s a sample code without error handling for simplicity:

#define SOCK_PATH "/tmp/echo.sock"

int main() {
    // Socket file descriptor

    struct sockaddr_un local = {
      .sun_family = AF_UNIX,
    };
    strcpy(local.sun_path, SOCK_PATH);
    unlink(local.sun_path);

    int len = strlen(local.sun_path) + sizeof(local.sun_family) + 1;
    int fd = socket(AF_UNIX, SOCK_STREAM, 0);
    bind(fd, (struct sockaddr *)&local, len);

    listen(fd, /*backlog=*/ 5);

    while (1) {
      printf("Waiting for a connection...\n");
      socklen_t slen = sizeof(remote);
      int fd_client;
      accept(fd, (struct sockaddr *)&remote, &slen);
      printf("Connected.\n");

      while (recv(fd_client, buffer, sizeof(buffer), 0) > 0) {
        send(fd_client, buffer, bytes_recvd, 0);
      }

      close(fd_client);
    }
}

The first difference with our network socket example [2] is that we don’t need to resolve the address via getaddrinfo(). The first common API is the creation of a socket:

int fd = socket(AF_UNIX, SOCK_STREAM, /*protocol=*/ 0);

The first parameter, AF_UNIX, represents the Unix domain (as opposed to AF_INET and AF_INET6 we used before, for IPv4 and IPv6, respectively). The second parameter indicates a connection-based byte stream, with semantics similar to TCP. The protocol parameter does not apply.

The unlink() call removes any files corresponding to SOCK_PATH, effectively de-associating it from pre-existing sockets.

The bind() will create a file named SOCK_PATH and associate it with the with the file descriptor. If a file already exists with that name, if will error out.

There’s not much else to comment on the rest of the code. The example client side can connect to the server by using the same file name, send a string and get it back:

#define SOCK_PATH "/tmp/echo123.sock"

int main() {
    struct sockaddr_un remote = {
        .sun_family = AF_UNIX,
    };

    int fd = socket(AF_UNIX, SOCK_STREAM, 0);

    strncpy(remote.sun_path, SOCK_PATH, strlen(SOCK_PATH) + 1);
    int len = strlen(remote.sun_path) + sizeof(remote.sun_family) + 1;
    connect(fd, (struct sockaddr *)&remote, len);

    std::string message = "hello world";
    send(s, message.c_str(), message.length() + 1, 0);

    char buffer[100];
    int len = recv(s, buffer, sizeof(buffer)-1, 0);
    buffer[len] = '\0';
    printf("%s\n", buffer);

    close(s);
}

Again, there’s not much I can add, except to note that the client side doesn’t need to call bind().

Benchmarks

I was looking for some benchmarks to get a sense of relative performance between these mechanisms. Peter Goldsborough’s IPC benchmark [4] show the throughput (messages/second) for messages of size 1Kb:

The multipliers for smaller messages (100b) are even bigger for shared memory and Memory-Mapped files, up to 40x higher throughput compared to Unix domain sockets, indicating there’s significant per-message overhead.

Comparison Table

We now provide a summary of all the different mechanisms covered in this post, comparing them on a few dimensions.

Type Transport Granularity Process Relationship Data Copy Bi-directional
Pipe Kernel Memory Byte streams Parent and Child Yes No
FIFO Kernel Memory Byte streams Independent Yes No
Message Queue Kernel Memory Messages Independent Yes No
Shared Memory Physical Memory N/A Independent No Yes
Memory Mapped Files Depends N/A Independent Yes Yes
Unix Domain Sockets Kernel Memory Messages Independent Yes Yes

Conclusion

In this post we studied several ways to communicate between processes on the same host. This was motivated by a project at work where we’re trying to decouple customer code from our main binary to reduce bloat and the operational overhead of being responsible for the reliability of customer code.

We’ve been debating primarily between sockets + Thrift serialization, shared memory and shared libraries. Shared memory is very performance but maintaining ABI compatibility is a significant maintenance burden and requires a major overhaul to fully avoid copies.

Shared libraries seems like a step in between but most of our infrastructure relies on statically linked binaries, so using those in production can also pose a significant maintenance cost.

While benchmarks show obvious performance benefits of more complex approaches, in practice these calls are usually not the throughput bottleneck. Further, as suggested by the benchmarks there’s a lot of overhead per message, so some of this cost can be amortized by using larger messages.

Sockets. As mentioned here, in that post we covered network sockets. The high-level concepts are pretty much the same between those and Unix domain sockets.

Namespace Jailing. In that post we aimed to create a jailed child process from a parent one. We used pipes to synchronize them and also used mmap() to allocate memory for the new process.

Linux Filesystems Overview. The relevants bits with that post is the notion of file descriptors being used for multiple purposes, as we witnessed here: pipes, FIFOs, shared memory segments, sockets.

In that post we also covered a few standard Linux system directories. The directory /dev/shm in particular is very relevant, since it’s where shared memory creates the file. While the sender binary from Shared Memory Segments is running, we can verify:

$ ls /dev/shm/
...
example_shm

We can also inspect /prod/<PID>/fd to see all the file descriptors owned by that process:

$ ls -l /proc/1629550/fd
...
lrwx------ 1 kunigami users 64 Dec  1 12:11 3 -> /dev/shm/example_shm

Or simply do lsof -p <PID>.

References