kuniga.me > NP-Incompleteness > ELF: Executable and Linkable Format

ELF: Executable and Linkable Format

12 Apr 2025

Image of an Elf, AI-generated using fantasy style
Recently we learned a bit how LLVM works, in particular that it produces an artifact called object code.

In this post I'd like to delve into them, in particular on the Executable and Linkable Format or ELF file format (used by most Linux systems). We'll cover what ELF is used for, how its contents are organized and finally how the operating system "executes" an ELF file.

Object Code

ELF is a file storing the object code produced by the compiler and it contains metadata for the operating system to load it into memory and actual data for the CPU to run the code.

Different families of operating systems use different formats. ELF is used for Linux while MacOS uses the Mach-O format and Windows uses the Portable Executable (PE).

Both ELF and PE are based on an older format called Common Object File Format COFF. COFF on its turn replaced a format called a.out. We can see references to it when we compile code without specifying an output name (the default filename is a.out).

We’ll be using this simple C++ program to follow along with an example:

#include <stdio.h>

int main() {
  printf("hello\n");
  return 0;
}

To compile it, I did:

clang example.cpp -o example

Shared libraries (.a or .so) are also represented using ELF.

ELF File Layout

The ELF layout consists of basically four parts: ELF header, Program Header, Other, Section Data and Section Header, typically in this order:

ELF Header
Program Header
Other
Section Data
Section Header

The ELF header contains metadata about the ELF file itself and the intended architecture it’s meant to be run in. The Program Header is a table of contents containing information on what to load from the file into memory. Each row in this table is called a segment.

The Section Header is a table of contents for the more granular chunks of memory called sections (a segment typically includes one or more sections). This header is mainly used by the linker and debugger, but not used during runtime, so it’s not loaded into memory.

All the headers discussed so far are table of contents: they contain indexes. The actual data is located under the Section Data. Note that not all section are loaded into memory (e.g. debug information are left out).

The Other contains information that are not part of any headers but are also not section data either. One example is the dynamic linker path. It’s indexed by the program header (via INTERP as we’ll see later) but not loaded into memory.

Why does the section header appears after the section data? That’s because the section header doesn’t need to be loaded into memory by the loader, so putting it between the program header and the section data that are loaded could lead to unnecessary disk seeking.

ELF Header

The ELF header describes some metadata of the ELF file itself, such as whether it’s meant for 64-bit architectures (e.g. Class = ELF64), little-endian vs big-endian (e.g. Data = 2's complement, little endian), instruction set (e.g. Machine = x86-64) whether it’s an executable or a shared library (e.g Type = EXEC). This header also tells the kernel where to find the other two headers: program and section headers.

To inspect the ELF header we can do:

readelf -h example


Program Header

The program header describes what portions from the ELF file (segments) to load into memory. It’s a index represented by a table with the columns:

Type Offset VirtAddr PhysAddr FileSiz MemSiz Flags Align

We’ll cover types shortly. Offset indicates where in the ELF file the segment is located. FileSiz is the length of the segment.

VirtAddr is the location in the virtual memory address space where this segment will be loaded. Recall that the OS allocates a dedicated virtual memory address space for each program, so programs can be very explicit about where each segment is loaded into that space. Of course the OS will map these addresses into arbitrary physical memory addresses. The attribute PhysAddr is ignored unless this binary is for firmware or kernel programs.

MemSiz indicates how much space the segment will take when loaded into memory. It can be larger than FileSiz because the file representation might be more compact than in runtime. For example:

// uninitialized.cpp

char v[100];

This array requires allocating 100 bytes in memory but it can be represented with fewer bytes in the file. In this case FileSiz < MemSiz.

Finally Flags controls access to the segment and Align is the byte alignment in memory. In general Align should not matter because the virtual memory address is specified, but there’s this thing called Position-Independent Executable (PIE) where the OS has the freedom to choose a different address, in which case it must honor the alignment.

Now on to types. The main types are:

PHDR INTERP DYNAMIC LOAD

PHDR, short for Program HeaDeR, is a segment representing the table itself. The table needs to be loaded into memory too, so we need to specify the memory address + size. This table will be used by the dynamic linker, which loads the dynamic libraries into memory. It needs to know where the information about dynamic libraries are loaded in memory, and it doesn’t have access to the ELF file.

The INTERP, short for INTERPreter, points to the path to the dynamic linker (also known as interpreter) executable, which will be invoked to link the libraries with the program. For my Linux system the path is /lib64/ld-linux-x86-64.so.2.

DYNAMIC are segments containing information about the dynamic libraries to be linked with the program. When code is compiled, it doesn’t include the code of the libraries on the ELF file, so they must be loaded in during runtime.

LOAD are segments that represent data or code to be loaded in memory. This includes the machine instructions that will be read by the CPU and also static variables.

To inspect the program header we can do:

readelf -l example

Section Header

As we discussed in ELF File Layout, the section header is not used in runtime. It’s only used by debuggers (running the binary with gdb or lldb) and linkers (as part of the overall compilation process). Recall that the job of the linker is to take multiple independent ELF files and stitch them together into one.

The columns of the section headers are:

Nr Name Type Address Offset

Nr is a sequence number to uniquely identify a section. Name can be used to indicate what a section does. For example, some common names are:

To inspect the section header we can do:

readelf -S example


Execution of an ELF file

We now cover the steps that take place when we run an ELF file. We start by executing our binary in a shell:

./example

The shell process runs the system call execve("example", argv, argp). This instructs the kernel to replace the current shell process with a new one.

The kernel starts by reading the ELF header file to extract metadata about the program. It also finds the offset of the program header in the file.

The kernel goes over each row of the program header and for each LOAD segment it maps the file contents into memory (recall that some segments take more space in memory, such those from uninitialized variables).

If the binary uses shared libraries, then the kernel reads the INTERP row to find the path of the dynamic linker, e.g. /lib64/ld-linux-x86-64.so.2. This is a shared library that is also represented as an ELF file. The kernel loads it in memory and passes control to it.

The dynamic linker is passed the location of the DYNAMIC segments to it can determine which shared libraries to load. We won’t go over the details here since I want to write a post specifically for shared libraries, but in overall terms: each shared library is also an ELF file, so part of the work of the dynamic linker is to map it to memory, the same way the kernel did initially.

The main difference is that the virtual address space belongs to the main program, not to the shared library, so the dynamic linker will ignore the VirtAddr of the program header. Shared libraries might also depend on other libraries, so this whole process is recursive. It also doesn’t need to pass control to a dynamic linker, since it’s already running.

The dynamic linker also needs to resolve symbols. If you have a function call such as:

f(x);

that is not defined in your binary, the dynamic linker must find which shared library implements it. After that it needs to update the instruction on that function call (this process is called relocation).

Once the dynamic linker is done, it passes the control back to the main() function of your binary.

Conclusion

To motivation of this post is that I wanted to know more about the ELF format. I had some very vague understanding of it after reading [1] a while back but lacked a good mental model.

I used mostly ChatGPT to write this post and I felt like a journalist interviewing an expert, asking questions and follow-ups where I needed more detail. It was a fun process and I find the word journalist appropriate with respect to considering my blog a diary or journal.

I learned about the overall structure of ELF files but I was also able to understand a lot more clearly how a program is executed! I had learned in college how a CPU might execute a program once it is initialized but I don’t recall learning about the initialization process itself.

Things I learned that I find particularly interesting:

References