Recently at work we were trying to solve a performance bottleneck. A part of the system was making requests to a remote key-value store but we saw that the majority of the requests resulted in the key not existing in the store.

An idea was to implement a negative cache: locally store the keys for which no value exist, to prevent unnecessary future requests. It should not use much memory and false negatives are fine (would result in redundancy), but false positives are not allowed (would result in incorrect results).

This seems like a problem we can solve with a variant of the classic Bloom filter, but with different guarantees of false negatives and positives, so I did some investigation and summarized in this post.

A Bloom filter [1] is a data structure that implements a set. In the most basic form, it allows efficient insertion and membership query. It’s also very space-efficient. The drawback however is that the membership query is not 100% accurate: If it says an element **is not** in the set, then it’s definitely not there. However, if it says an element **is** in the set, there’s some probability the element is actually not there.

We studied Bloom filters in detail in a previous post. Summarizing, the data structure itself is a simple bit array of size $m$. Whenever we want to insert a element in the set, we hash it using $k$ hash functions, obtaining $k$ indices on the bit array which we set.

To check if a element is in the set, we again hash it using the same $k$ hash functions, obtaining $k$ indices on the bit array and return true only if all the bits are set. The intuition is that different elements might collide on the same hash but the chances of them colliding on $k$ hashes is much smaller. However, as the bit array fills up, the chance of collisions with a set of different elements goes up.

In that post, we concluded that the probability of **false negatives** is 0, and the probability of **false positives** is $p = 0.62^{m/n}$, where $n$ is the number of elements inserted, with the optimal choice of $k = \ln(2)m/n$.

The complexity of insertion and removal is $O(k)$, but $k$ is usually a small constant. Space complexity is $O(m)$ bits.

Now suppose we want a data structure like a Bloom filter but it has the opposite guarantees: If it says an element **is** in the set, then it’s definitely there. However, if it says an element **is not** in the set, there’s some probability the element is actually there.

All references I found suggest it’s not possible, but I haven’t been able to find a formal proof for this. One difficulty is that the problem is not well defined: what are the constraints of this data structure? Does it also have to use $O(m)$ bits and have the accuracy for the false negative be $p = 0.62^{m/n}$?

If we don’t have any guarantees on the probability, a simple way to satisfy the constraint is to *always* say the element is not in the set. We don’t have to store any elements, and it never reports false positives this way!

How about false negatives? Let $N$ be the number of elements on the domain (possibly infinite) and that the element we’re querying is selected at random from this domain. If there are $n$ elements currently in the set, then the probability of a random element being in the set is $n/N$, which is also the probability of false negative. If $N$ is much larger than $n$, that number is actually quite low.

In practice however, the elements queried are not uniformily distributed. In fact it’s very likely that an element shows up repeatedly, so the false negatives would be pretty high.

Let’s now add the constraint that for at least one element that exists in the set, it must correctly report it is in there.

The counterpart property in the regular Bloom filter is that for at least one element that doesn’t exist in the set, it must report so. If the filter is not nearly or completely full and the hash function is decent, this shall be safisfiable.

For the inverse case, let’s be lenient and say that it only has report the presence in the set for exactly one element $s$.

To know whether $s$ is in the set, we need to store something that identifies it in our data structure. We can store $s$ itself, but it might be larger than $m$ bits in size! So perhaps we can hash it and bring its size down to $m$ bits. But as long as $N$ (the size of the domain) is larger than $2^m$, by the pigeonhole principle there are other strings that will hash to the same value, i.e. collisions are unavoidable.

Let $s’$ be an element which hashes to the same value as $s$ and suppose $s’$ is not in the set. So when we query for $s’$, the structure won’t be able to tell it apart from $s$ and falsely report it is in the set, and we violated the 0 probability of false positives.

So even a minimally useful negative Bloom filter is not possible. What can we do? A middleground is presented in [2]. The idea is that we use a regular hash table, except that:

- When we get a collision we replace the entry instead of keeping multiple entries
- We store part of the hash element as value.

Let’s cover it in more details. Let $l$ be an integer such that $m = 2^{l}$ and we construct an array of size $m$ with each element being of size $b$ (bits). We then choose a hash function that hashes a given element to at least $b + l$ bits.

To insert a element, we hash it and use the first $l$ bits as the index in the array, and the remaining $b$ bits as the value in that array. To check for membership, we repeat the process to find the index, but only claim the item is in the array if the remaining $b$ bits match what’s stored.

It’s a very simple process and yet it yields good practical gurantees:

**Proposition 1.** The probability of a false positive is less than $1/2^{b}$.

Now, assuming a completely random hash function, that position can be filled with any of the $2^b$ values with equal probability, so the chance of it being exactly $v$ is $1/2^b$.

If the number of elements inserted so far is small, there's also a chance that position $p$ hasn't even been set yet, but here we consider the worst case scenario.

Note that this probability doesn’t depend on how many elements have been inserted so far.

**Proposition 2.** The probability of a false negative is less than $n/2^{l}$, where $n$ is the number of elements inserted.

The algorithm computed a index $p$ and value $v$ for that element when inserting it. For any given element to replace it, it must match on $p$ and mismatch on $v$. The probability on mismatching on $v$ is very high, so we can simplify and assume the probability of replacement is $1/2^{l}$.

Let $n$ be the number of elements inserted after our element in question was inserted. For its value to not be replaced, we must have every single one of these entries to not be $p$, which has probability $(1 - 1/2^{l})^n$. Thus the probability of it being replaced is $1 - (1 - 1/2^{l})^n$.

For large values of $l$, $1 - 2^{-l} \approx \exp(-2^{-l})$, so the probability is $1 - \exp(-n2^{-l})$. Using that $1 - e^{-x} \lt x$, we conclude that $$Pr \lt n2^{-l}$$

In [2], the probability of a false positive is claimed to be $1/2^{p + b}$. This would be true if we stored the entire hash as value, but we only store $b$ bits of it. The other $p$ bits are used deterministically as the index in the array. However, this doesn’t change the conclusion.

Inserting and querying for a element are $O(1)$ operations. The space used by this structure is $O(2^l b)$ bits. In [2] Ilmari Karonen suggests $l = 16$ and $b = 128$, which corresponds to `1MB`

of memory, with a false positive rate of $1/2^{128}$ which is for most practical purposes indistinguishable from 0. The false negative rate will depend on how many elements are inserted.

This is a type of LRU cache, except that we don’t store the value and the key size is bounded by $O(b)$ bits.

Here’s a simple implementation in Python. For lack of better ideas, I’ll call it `ApproxSet`

:

As we can see, the implementation is very simple, except for getting the index and value from the hash.

The setup is that we insert `N`

random elements (sample from a domain of size `M`

) into this data structure and when it reaches a multiple of 10 in percentage of occupancy, we query another `N`

random elements to compute the probabilities.

For `l = 16`

, `b = 128`

, `N = 1,000,000`

and `M = 100,000`

, we obtain the following result:

The linear growth with occupancy (a loose proxy for the number of inserted $n$), matches the estimate of $n/2^{l}$ from *Proposition 1*.

We can try to have some false positives by having `b`

be very small, for example, just one bit. For `l = 16`

, `b = 1`

, `N = 1,000,000`

and `M = 100,000`

, we obtain the following result:

This result suggests that the probability of false positives is not $1/2^{p + b}$, because for this case we’d have a probability of $1/2^{17}$, whereas for higher occupancy we observed $36\%$ of false positives. This is however consistent with $1/2^b = 50\%$ from *Proposition 2*.

The main takeway from the investigation is that negative Bloom filters don’t exist. In the process I did learn about an implementation of a LRU cache that is about 128 bigger than a Bloom filter but in practice it has the guarantees we needed for our problem.

In the end for our application the keys are 64-bit ids, so storing 128-bit for them wasn’t worth it and using a off-the-shelf LRU cache turned out to be simpler.

This cache solution also has builtin TTL support, which is very important for our use negative cache use case and is notably complicated to do with a Bloom filter. However the approximate set discussed above handles this nicely by replacing keys on collision.

This is our fifth post in the series with my notes on complex integration, corresponding to *Chapter 4* in Ahlfors’ Complex Analysis.

We’ll focus on Cauchy’s Integral Formula, which is not only a tool on itself for solving some types of integrals but also a stepping stone for several other results including: *Morera’s theorem*, *Cauchy’s Estimate* and *Liouville’s Theorem*.

The previous posts from the series:

In the previous post, The Winding Number [3], we explored the winding number as:

\[n(\gamma, a) = \frac{1}{2\pi i} \int_{\gamma} \frac{dz}{z - a}\]Which can be interpreted as the number of revolutions a closed curve $\gamma$ performs around a point $a$ not on it. Of particular interest is the case where $n(\gamma, a) = 1$.

In the post before that, Cauchy Integral Theorem [2], we learned that if $f(z)$ is holomorphic, then, under some conditions

\[\int_{\gamma} f(z)dz = 0\]These results will be crucial to the main topic of this post: *Cauchy’s Integral Formula*.

What happens if we try to apply the Cauchy integral theorem to this function:

\[F(z) = \frac{f(z) - f(a)}{z - a}\]for some $a$ not on the curve $\gamma$? We can interpret this function as a rate of change (how much f(z) changes when $z$ does). In fact, notice that $\lim_{z \rightarrow a} F(z)$ is essentially $f’(a)$.

We’ll see that $F(z)$ is holomorphic except at $z = a$, but this “singularity” is acceptable under the Cauchy integral theorem and we can still conclude that

\[\int_{\gamma} F(z)dz = 0\]from this we’re be able to derive *Cauchy’s Integral Formula*.

**Lemma 1 (Cauchy’s Integral Formula).** Let $f(z)$ be holomorphic in an open disk $\Delta$, and a closed curve $\gamma$ in $\Delta$ and a point $a \in \Delta$ not on $\gamma$ and such that $n(\gamma, a) = 1$. Then:

This formula is useful because it enables us to compute the value of $f$ at any point $a$ inside a simple curve $\gamma$ if we know how to compute $f(z)/(z - a)$ at its boundaries!

We can also compute the integral $f(z)/(z - a)$ on the boundary of a simple curve from a point inside it. Let’s look at an example. Suppose we want to compute:

\[\int_{\abs{z} = 1} \frac{e^z}{z} dz\]Let $f(z) = e^z$ and $a = 0$. Then $F(z)$ as in $(A)$ is holomorphic in the disk $\abs{z} \lt 2$ except at $a$. This allows us to use $(1)$ where $\gamma$ is $\abs{z} = 1$ (since $n(\gamma, a) = 1$ for any point inside a circle):

\[e^{0} = 1 = \frac{1}{2\pi i}\int_{\abs{z} = 1} \frac{e^z}{z} dz\]Thus:

\[\int_{\abs{z} = 1} \frac{e^z}{z} dz = 2\pi i\]Recall from [6], that a complex derivative is given by:

\[f'(z) = \lim_{h \rightarrow 0} \frac{f(z + h) - f(z)}{h}\]where $h$ is a complex number. We want to prove that if $f’(z)$ exists, then $f^{‘’}(z)$ exists, i.e. it’s infinitely differentiable. We’ll go further and provide an explicit formula for the $n$-th derivative (equation $(3)$). The proof can be derived from Cauchy’s Integral Formula.

Before that however, we’ll need an auxiliary lemma:

**Lemma 2.** Let $f(z)$ be a continuous function on the closed curve $\gamma$. Then

is holomorphic in each of the regions determined by $\gamma$ and the derivative is $F'_n(a) = n F_{n + 1}(a)$.

We'll first show that $F_1$ is continuous at any point $a_0$. One way to show this is that for every $\epsilon > 0$, we can find a neighborhood around $a_0$, $\abs{a - a_0} \lt \delta$, for which $\abs{F_1(a) - F_1(a_0)} \lt \epsilon$. Let's compute $F_1(a) - F_1(a_0)$, by first replacing $(2)$: $$F_1(a) - F_1(a_0) = \int_{\gamma} \frac{f(z)}{z - a} dz - \int_{\gamma} \frac{f(z)}{z - a_0} dz$$ Grouping under one integral and normalizing by a common denominator: $$ = \int_{\gamma} \frac{f(z) ((z - a_0) - (z - a))}{(z - a)(z - a_0)} dz $$ Cancelling terms and moving constant factors out: $$(2.1) \quad F_1(a) - F_1(a_0) = (a - a_0) \int_{\gamma} \frac{f(z)}{(z - a)(z - a_0)} dz $$ We'll find a relationship between $\epsilon$ and $\delta$, so then for any $\epsilon$ we're given, we'll know how to pick $\delta$. We start by choosing $\delta \gt 0$ such that the open disk around $a_0$, $\abs{a - a_0} \lt \delta$ doesn't cross with $\gamma$. Now consider the inner circle $\abs{a - a_0} \lt \delta / 2$. If we restrict $a$ to be in there, we have: $$(2.2) \quad \abs{z - a} \gt \delta / 2$$ for $z \in \gamma$. To see why, first check

This was the inductive basis. The inductive hypothesis is that we'll assume $F_{n-1}$ is holomorphic and $F'_{n-1} = n F_{n}$ (this hypothesis is applicable to $G_n$ as well). We need to prove that $F_{n}$ is holomorphic and $F'_{n} = n F_{n + 1}$. Let's compute $F_n(a) - F_n(a_0)$ by replacing their definition $(2)$: $$(2.4) \quad F_n(a) - F_n(a_0) = \int_\gamma \frac{f(z)}{(z - a)^n}dz - \int_\gamma \frac{f(z)}{(z - a_0)^n}dz$$ We'll use the follow identity: $$\frac{1}{(z - a)^n} = \frac{1}{(z - a)^{n-1} (z - a_0)} + (a - a_0) \frac{1}{(z - a)^n (z - a_0)}$$ Replacing this in $(2.4)$: $$ = \paren{\int_\gamma \frac{f(z)}{(z - a)^{n-1} (z - a_0)}dz - \int_\gamma \frac{f(z)}{(z - a_0)^n}dz} + (a - a_0) \int_\gamma \frac{f(z)}{(z - a)^n (z - a_0)} dz$$ The first integral is $G_{n-1}(a)$, the second is $G_{n-1}(a_0)$: $$(2.5) \quad F_n(a) - F_n(a_0) = G_{n-1}(a) - G_{n-1}(a_0) + (a - a_0) \int_\gamma \frac{f(z)}{(z - a)^n (z - a_0)} dz$$ We wish to show $F_n$ is continuous at $a_0$. In other words, that there exists $\delta \gt 0$ such that $\abs{a - a_0} \lt \delta$ implies $\abs{F_n(a) - F_n(a_0)} \lt \epsilon$ for any $\epsilon \gt 0$. As before, we'll do backwards and find a $\delta$ that makes $$\abs{F_n(a) - F_n(a_0)} = \abs{G_{n-1}(a) - G_{n-1}(a_0)} + \abs{a - a_0} \int_\gamma \frac{\abs{f(z)}}{\abs{z - a}^n \abs{z - a_0}} \abs{dz} \lt \epsilon$$ Since $G_{n-1}$ is differentiable at $a_0$ (by hypothesis), it's continuous and thus there is $\delta_1 \gt 0$ such that $\abs{a - a_0} \lt \delta_1$ implies $$\abs{G_{n-1}(a) - G_{n-1}(a_0)} \lt \epsilon_1$$ Using $(2.2)$ and $(2.3)$ as before, we can conclude that: $$\int_\gamma \frac{\abs{f(z)}}{\abs{z - a}^n \abs{z - a_0}} \abs{dz} \lt \frac{2^{n}}{\delta^{n+1}} \int_\gamma \abs{f(z)}\abs{dz} = \frac{2^{n}}{\delta^{n+1}} k$$ If we call $\epsilon_2 = (2^{n}k)/\delta^{n+1}$ we know how to pick $\delta$ to obtain that bound. Choosing $\epsilon_1 = \epsilon_2 = \epsilon / 2$, and using $\abs{a - a_0} \lt \min(\delta_1, \delta)$ should give us $\abs{F_n(a) - F_n(a_0)} \lt \epsilon_1 + \epsilon_2 = \epsilon$.

So $F_n$ is continuous and so is $G_n$. The remaining integral in $(2.5)$ is $G_n(a)$, so we have: $$F_n(a) - F_n(a_0) = G_{n-1}(a) - G_{n-1}(a_0) + (a - a_0) G_n(a)$$ Dividing by $(a - a_0)$ and taking the limit $a \rightarrow a_0$ gives us $F'_{n}$: $$F'_n(a_0) = \lim_{z \rightarrow a} \paren{\frac{G_{n-1}(a) - G_{n-1}(a_0)}{a - a_0} + G_n(a)}$$ or since limit is invariant with sum: $$= \lim_{z \rightarrow a} \paren{\frac{G_{n-1}(a) - G_{n-1}(a_0)}{a - a_0}} + \lim_{z \rightarrow a} G_n(a)$$ The first limit is $G'_{n-1}(a_0)$ and by hypothesis equal to $(n-1) G_n(a_0)$. Since $G_n(a)$ is continuous, it equal $G_n(a_0)$ as $a \rightarrow a_0$: $$= (n - 1) G_n(a_0) + G_n(a_0) = n G_n(a_0)$$ Finally, we have that $G_{n}(a_0) = F_{n+1}(a_0)$, so: $$F'_n(a_0) = n F_{n+1}(a_0)$$

We note that we can write $(1)$ as:

\[f(a) =\frac{1}{2\pi i} F_1(a)\]We can now use *Lemma 2* to compute the $n$-th derivative of $f(a)$:

**Lemma 3.** Let $f(z)$ be a holomorphic in $\Delta$ and let $\gamma$ a closed curve in $\Delta$, and $a$ such that $n(\gamma, a) = 1$. Then

Summarizing, *Lemma 2* proves that line integrals are infinitely differentiable and *Lemma 1* allows us to express a holomorphic function $f$ at any point $a$ as a function of a line integral. Combining both gives us that a function $f$ is also infinitely differentiable.

Suppose $f(z)$ is holomorphic in a region $\Omega$ and $a \in \Omega$. Since $\Omega$ is open, we can always find an open disk $\Delta$ as a neighborhood of $a$, $\abs{z - a} \lt \delta$, and inside it a circle $C$ containing $a$. Since a circle winds exactly once around points on its interior, $n(C, a) = 1$.

In these conditions we can apply *Lemma 3* and thus conclude that $f(a)$ is infinitely differentiable. For each $a$ on the domain of $f$, we can always choose a suitable $C$, so this leads us to the following high-level corollary:

**Corollary 4.** Holomorphic functions are infinitely differentiable.

Another consequence is that if $f(a)$ is the derivartive of a holomorphic function, then $f(a)$ itself is holomorphic. Let’s revisit *Corollary 1* in *Cauchy Integral Theorem* [2]:

Let $f(z)$ be a function defined in $\Omega$. Then $\int_\gamma f(z)dz = 0$ if and only if $f$ is the derivative of some holomorphic function $F$ in $\Omega$.

So one direction says that if $\int_\gamma f(z)dz = 0$ then $f$ is the derivative of a holomorphic function. But now we know $f$ is also holomorphic. This leads to a famous result:

**Theorem 5 (Morera’s theorem)** If $f(z)$ is defined and continuous in a region $\Omega$, and $\int_\gamma f(z)dz = 0$ for any closed cuver $\gamma$, then $f(z)$ is holomorphic.

Suppose $f(z)$ is holomorphic and bounded. We can then obtain an upperbound for $\abs{f^{(n)}(a)}$ via *Cauchy’s Estimate*.

**Lemma 6 (Cauchy’s Estimate).** Let $f(z)$ be holomorphic and bounded by a finite $M$ in a region $\Omega$ (i.e. , $\abs{f(z)} \le M$ for all $z \in \Omega$). Let $C$ be a circle of radius $r$ centered in $a$ ($C$ is inside $\Omega$). Then:

We can use this result to prove another famous result, *Liouville’s Theorem*:

**Theorem 7 (Liouville’s Theorem).** If $f(z)$ is holomorphic and bounded on the whole plane, then it’s a constant function.

Liouville’s Theorem can on its turn be used to prove the *Fundamental Theorem of Algebra*.

**Theorem 8 (Fundamental Theorem of Algebra).** If $P(z)$ is a single-variable polynomial of complex coefficients and degree greater than 0, then it has at least one complex root.

$1/P(z)$ is also bounded since besides having $P(z) \ne 0$, as $z \rightarrow \infty$, $1/P(z)$ tends to 0. Thus, according to Liouville's Theorem, $1/P(z)$ must be a constant function, and so is $P(z)$, implying it has degree 0, a contradiction.

We derived Cauchy’s integral formula from Cauchy’s integral theorem applied to the rate of change of $f(z)$, which we called $F(z)$.

We also simplified our lives by only considering cases where $n(\gamma, a) = 1$ to get rid of this factor. It turns out not to be a big problem: we can choose our closed curve to have that property and still get Morera’s and Liouville’s theorems.

To make sure we could build on top of results from previous posts, we had to make sure to look at a neighborhood of each point $a$ (by choosing the circle $C$), small enough to guarantee for example that $f(z)$ is holomorphic there. In this sense, these properties we have proved, such as that holomorphic functions are infinitely differentiable, are “local” properties.

Finally, none of the famous results (*Theorems 5, 7, 8*) made full use of *Lemma 3*. *Morera’s theorem* only used the fact that the derivative of a holomorphic function is holomorphic and *Liouville’s theorem* only used it for $n = 1$. Fuller use of *Lemma 3* will be left for future posts.

In this post we’ll discuss coroutines in C++, which is a feature introduced in C++20. We’ll first start by understanding what coroutines mean in C++, comparing briefly with coroutines in other languages.

Then we’ll provide a minimal example using coroutines and progressively add capabilities to it while introducing concepts and features from the coroutine toolkit.

Coroutines are common in languages like Python and JavaScript and they’ve more recently been added to C++ (starting from C++20).

If you’re familiar with coroutines in Python, coroutines in C++ might feel familiar with the `async`

functions and the `await`

operator but at the same time a lot harder to grok. Why?

In [2] Lewis says:

C++ Coroutines TS provides in the language can be thought of as a low-level assembly-language for coroutines.

and:

Coroutines TS does not actually define the semantics of a coroutine

In summary, coroutines in C++ are a lot more generic and low-level than their counterparts in other languages, which has potential for more use cases, but makes them hard to use directly by end users. The idea is thus to have libraries build on top of these primitives. Interestingly, the STL itself doesn’t provide any such high-level implementations.

So unless you’re a library developer, understanding C++ coroutines might not be necessary if the libraries you’ll end up using are well abstracted. In any case, it might be interesting to peek under the hood.

The only constructs the C++ language exposes to developers are the operators `co_await`

, `co_yield`

and `co_return`

.

However, the compiler generates code behind the scenes that translate these operators into actual C++ code. I’ll be using **coroutine machinery** as a vague term to refer to the combination of the compiler and this implicit code that is generated.

The crucial feature from coroutines is what we call **suspension**, the capability of a function to return to its caller midflight and then later be resumed from where it stopped, with all the local variables preserved.

Here’s a minimal example where we can see suspension in action.

In this example, `f`

is a coroutine. In `main()`

we invoke `f()`

which executes it until the `co_await`

and then it returns a handle, `Task`

, back to `main()`

. We can then resume the coroutine by invoking the handle `h()`

again.

In the example above, `Task`

is not a structure defined in the STL. We have to define it ourselves. Here’s an example:

So `Task`

is a specialization of `std::coroutine_handle`

for some `Promise`

class, which we leave unspecified for now, plus the field `promise_type`

which associates this class with a specific `Promise`

.

We don’t actually have to use inheritance to create our own coroutine handle. We can also use composition, making sure to implement the necessary methods, for example:

The one needed by the coroutine framework is `from_promise()`

. The `operator()`

is just a syntax sugar. We could also define a method called `resume()`

and update `main()`

accordingly, which is actually a bit more clear:

Let’s now cover the promise class, which goes hand-in-hand with a coroutine handler.

We shall not confuse the concept of promises in coroutines with the STL’s `std::promise`

. According to Lewis Baker [4]:

I want you to try and rid yourself of any preconceived notions of what a “promise” is. While, in some use-cases, the coroutine promise object does indeed act in a similar role to the

`std::promise`

part of a`std::future`

pair, for other use-cases the analogy is somewhat stretched. It may be easier to think about the coroutine’s promise object as being a “coroutine state controller” object that controls the behaviour of the coroutine and can be used to track its state.

In the context of coroutines, a promise is any class `T`

that implements the methods:

`T get_return_object()`

`Awaitable initial_suspend()`

`Awaitable final_suspend()`

`void unhandled_exception()`

Here’s a possible implementation of `Promise`

, which we associated with a `Task`

above:

The method `get_return_object()`

constructs a coroutine handler, in our example a `Task`

, from a promise. As we’ll see soon, promise is an implementation detail of the coroutine machinery that is not exposed directly to code, so we need to wrap a promise in the coroutine handler before we return control to the caller.

The coroutine machinery knows the class of the promise to create based on the return type of the function. In our example, `f()`

has return type `Task`

, and it expects this type to have the field `promise_type`

. So in this case it knows to use `Promise`

.

The other methods customize the behavior of a coroutine. In our example, we declared:

Note this is equivalent to

The class `std::suspend_never`

is roughly:

The interesting method here is `await_ready()`

. The coroutine machinery will call `promise.initial_suspend().await_ready()`

to determine whether to start executing the function right away or wait until `resume()`

is first called.

Since our `Promise`

returns `std::suspend_never`

, the function `f()`

does not suspend until it hits a `co_await`

. If we replace it with:

And run `main()`

, we observe a change in behavior:

The `final_suspend()`

is an analogous method but for when the function ends or throws an exception. Summarizing, the coroutine machinery is essentially wrapping the body of the function as:

Notice that the coroutine machinery uses `co_await`

to call `promise.initial_suspend()`

and `promise.final_suspend()`

above.

You’ll notice that the class `std::suspend_always`

is also used in the expression

inside `f()`

. Further, the coroutine machinery uses `co_await`

to call `.initial_suspend()`

and `.final_suspend()`

. Let’s understand these better.

An awaitable is any class that implements the methods:

`bool await_ready()`

`void/bool await_suspend(std::coroutine_handle<>)`

`T await_resume()`

An awaitable can be used with the operator `co_await`

, which is implemented roughly as (I’m omitting a lot of the different branches, for the sake of simplicity. See [3] for a complete picture):

So a awaitable has different methods that are called at different points in the lifecycle of the suspension action. Note that the awaitable has the opportunity to act after the suspension of the coroutine happening but before the function returning control to the caller.

We never return the awaitable object directly to the caller. If we want to send information back, we can piggyback on the promise object, which can be accessed through the coroutine handle, itself passed to the awaitable’s `await_suspend()`

.

First we implement a new class that satisfies the awaitable constraints, `Awaitable`

, and can hold a string:

The key method is `await_suspect()`

: when we get a handler we access the promise and set the string there. Of course we need to make sure the `Promise`

object can store a string too:

And we add a syntax sugar for accessing the promise `value`

via the handler:

We can then change the `main()`

function:

`co_yield`

and `co_return`

Instead of defining the `Awaitable`

class just to forward a value to the promise, we can add the methods `return_value()`

and `get_value()`

to the promise object:

And then use the special operators `co_yield`

and `co_return`

:

One interpretation is that `co_yield`

and `co_return`

are ways to “wrap” a regular value into a coroutine handle. We’ve also seen how to “unwrap” the value from the handle, either by the overloaded `operator()`

or `.resume()`

, but we only did this outside of a coroutine (the `main()`

function in our case).

We can unwrap the value of a handle inside a coroutine too as we’ll see next.

In Python, if we have an async function, we can await its results and use it afterwards, as long as we’re in an async function too. For example:

In C++, we can also “extract” the value from a coroutine handle using the `co_await`

operator. The trick is to make the coroutine handle an awaitable too, by implementing the methods discussed in the *Awaitable* section.

The key difference is that `await_resume()`

now returns a type. Recall that in the section *Awaitable*, the pseudo-implementation of `co_await`

as a function, the return value is `awaitable.await_resume()`

, so that’s what gets assigned to the right hand side of `co_await`

To complete the example, we define a function `g()`

that returns a coroutine handle with a value:

and then in `f()`

we extract that value and combine with another:

For this to work like the Python example, we’ll need to change `Promise::initial_suspend`

to not suspend, otherwise it would look like:

In this post we covered the basic of C++ coroutines. I found it pretty hard to understand them and that I couldn’t build on top of my understanding of coroutines in JavaScript or Python. The observation from Marières [1], on it being a more lower-level with no batteries included, made me understand why.

I read Lewis’s blog posts [2, 3, 4] which are very detailed and technical, so I found it isn’t a very good source for ramping up on coroutines. Marières [1] provides a more digestible since the author provides a lot of insights from a first-learner perspective.

In my post, I tried to do a more example-based exposition, and am happy with the result of starting with a digestable example and progressively build on top of it and introducing other concepts little by little.

Both Marières and Lewis spend time discussing implementation details such as the fact that coroutine frames are stored on the heap instead of the stack, which I found not necessary for an introduction to coroutines.

In this post, I tried to focus more on the syntax and semantics of coroutines, and less how they’re implemented in practice. The examples from this post did nothing useful, but for a future post I’d like to implement async operations using coroutines.

Python Coroutines and Async Functions in JavaScript cover the same concept of suspendable functions, but in Python and Javascript this is implemented with event loops and is single threaded.

This is our fourth post in the series with my notes on complex integration, corresponding to *Chapter 4* in Ahlfors’ Complex Analysis.

In this one, we’ll explore the concept of the winding number of a curve with respect to a point, which can be interpreted how many times a curve winds around that point. It has an interesting relationship with the Cauchy integral theorem that we learned about in a prior post.

The previous posts from the series:

In the previous post, Cauchy Integral Theorem [4], we concluded with *Theorem 4* saying that:

if $\gamma$ is contained in a disk $\Delta$ and that $f(z)$ is holomorphic in $\Delta$. And this holds even if $f(z)$ is not holomorphic in a finite set of points in $\Delta$, as long as each of these points $\xi$ satisfy:

\[(1) \quad \lim_{z \rightarrow \xi} (z - \xi) f(z) = 0\]Now let’s look at a specific function, $f(z) = 1/(z - a)$. Assuming the point $a$ is in the disk $\Delta$, then $f(z)$ is not holomorphic at $a$, but maybe it satisfies $(1)$? We can find if that’s the case by replacing $a$ in $(1)$:

\[\lim_{z \rightarrow a} (z - a) \frac{1}{z - a} = 1\]So no, we can’t use *Theorem 4* from [4] to conclude

equals 0. Still, it’s worth exploring this integral further. What does its value represent? In which case it is 0? This is what we’ll focus on this post.

Let $\gamma$ be a piecewise differentiable closed curve and $a$ a point not on the curve. We define **the winding number** of $a$ with respect to $\gamma$, denoted by $n(\gamma, a)$ as:

The geometric interpretation of this value is how many times the curve $\gamma$ winds (i.e. completes a revolution) counter-clockwise around $a$. Wikipedia provides several examples:

Note that when the curve winds clockwise the value is negative.

Let’s build a geometric intuition on why the formula $(2)$ corresponds to the number of revolutions. Recall that the set of points in the circumference of a circle of radius $\rho$ and center $a$ can be written in polar form as:

\[z(\theta) = a + \rho e^{i\theta}\]For $0 \le \theta \lt 2 \pi$. We can generalize this idea for any closed curve $\gamma$ by picking any point $a$ not in it. The major difference is that the “radius” (i.e. the distance between a point in $\gamma$ and $a$) $\rho$ would not be fixed, but a function of $\theta$. We can parameterize both by some $0 \le t \le 1$ as:

\[z(t) = a + \rho(t) e^{i\theta(t)}\]Noting that $a$ doesn’t have to be inside the curve. We won’t prove it, but it’s possible to show that if $\gamma$ is differentiable so is $\rho(t)$ and $\theta(t)$. Now if we imagine $t$ is time and we place an observer at point $a$ rotating to follow the point $z(t)$ as we travel from $t = 0$ to $t = 1$, the amount of “angle displacement” this observer will perform can be found by adding up the delta angle between adjacent timestamps, $\Delta \theta = \theta(t_i) - \theta(t_{i-1})$.

Because it’s a closed curve, the observer must finish facing at the same direction they started, meaning they completed an integer number of revolutions and thus the total angle displacement should be a multiple of $2 \pi$.

If we allow $\theta(t)$ to go beyond $2\pi$ (when multiple revolutions occur), then the total angle displacement is: $\theta(1) - \theta(0)$ and the number of revolutions (and hence the winding number) is

\[(3) \quad n(\gamma, a) = \frac{\theta(1) - \theta(0)}{2\pi}\]With this intution in mind, *Lemma 1* formalizes the correspondence between $(2)$ and the number of revolutions:

**Lemma 1.** Let $\gamma$ be a piecewise differentiable closed curve and a point $a$ not in $\gamma$. The curve can parametrized with respect to $a$:

for $0 \le t \le 1$ and differentiable functions $\rho(t)$ and $\theta(t)$. Then:

\[(4) \quad \int_{\gamma} \frac{dz}{z - a} = i(\theta(1) - \theta(0))\]We can replace $(3)$ in $(4)$:

\[\int_{\gamma} \frac{dz}{z - a} = (2 \pi i) n(\gamma, a)\]Which gives us $(2)$.

The first property is that if we flip the direction of a curve $\gamma$, we negate the winding number, i.e.:

\[n(\gamma, a) = -n(-\gamma, a)\]This follows from the application of $(2)$ and using that

\[\int_\gamma f(z)dz = -\int_{-\gamma} f(z)dz\]Intuitively, if you’re an observer away from the curve, you don’t need to “turn around yourself” to follow a point along it, so the winding number of an external point is 0, as in the third example of *Figure 1*. How about the example from *Figure 2* (left): is it inside or outside?

We need a more precise way to define “outsideness”. There’s a topological formal definition, but we won’t go over it here. Instead, we can get an intuition by imagining the curve is a rubberband on the surface of a table. You then put your finger at point $a$. If you can remove the rubberband without lifting your finger, then point $a$ is “outside”.

In the case of *Figure 2*, the observer ends up going around thelselves, but they reverse back before finish, so in the end their winding number is still 0 (left image). It’s possible to show that the winding number is 0 if and only if a point is outside the curve, so we could use this alternative definition for “outsideness”.

We won’t prove this equivalence here, but rather a weaker result, in which the observer is “away” from the curve, i.e. not surrounded by it:

**Lemma 2.** Let $\gamma$ be a closed curve, $C$ be a circle enclosing it and $a$ a point outside the circle. Then $n(\gamma, a) = 0$.

We can always enclose any bounded curve with a circle if we choose a sufficiently large radius and still have $\infty$ outside it, which leads us to the corollary:

**Corollary 3.** Let $\gamma$ be a closed curve. Then $n(\gamma, \infty) = 0$

Generalizing the idea of inside/outside, we can consider the regions determined by a curve $\gamma$. The curve $\gamma$ is a bounded and closed set, so its complement is unbounded and open. The complement can be partitioned into connected components, exactly one of which is unbounded (the one containing infinity).

*Lemma 4* shows that any two points in the same region have the same winding number.

**Lema 4.** Let $\gamma$ be a closed curve and let $\curly{R_i}$ be the set of components determined by it. Let $a, b$ be points in the same region $R_i$. Then $n(\gamma, a) = n(\gamma, b)$.

Let's define $\Omega$ as the complement of $\overline{uv}$. We claim that for $z \in \Omega$ the function: $$(4.1) \quad f(z) = \frac{z - u}{z - v}$$ returns either a positive real value or a imaginary number, that is, never a negative real value. The key reason is that since $z$ doesn't belong to $\overline{uv}$. With this in mind, let us write $(z - u)$ and $(z - v)$ in polar form, say $r_u e^{i\theta_u}$ and $r_v e^{i\theta_v}$, so $$f(z) = \frac{r_u}{r_v} e^{i(\theta_u - \theta_v)}$$ We can interpret $(z - u)$ and $(z - v)$ geometrically as directed vectors, from $\overrightarrow{uz}$ and $\overrightarrow{vz}$ respectively. We have two cases: either $z$ lies on the line defined by $u$ and $v$ (but not between them!) or not. In the first case, the angle $\theta_u = \theta_v$, since the vectors point in the same direction, so $f(z) = r_u/r_v$, that is, a positive real.

In case 2, $z$ is not in the line defined by $u, v$, so it forms a triangle as depicted in

This property is important because we wish to compute $\log(f(z))$. The function $\log(z)$ is not holomorphic on the entire complex plane but is for $\mathbb{C} - \mathbb{R}_{\le 0}$ (i.e. complex numbers excluding the non-positive reals). So basically we're saying that for any $z \in \Omega$, the log of $\log(f(z))$ has a derivative, which can be show to be: $$(4.2) \quad g(z) = \frac{d\log(f(z))}{dz} = \frac{1}{z - u} - \frac{1}{z - v}$$ Since $g(z)$ is the derivative of a holomorphic function $\log(f(z))$, in $\Omega$, we can use

A corollary from *Lemma 4*, *Corollary 3* and that the unbounded region determined by $\gamma$ contains $\infty$:

**Corollary 5.** Let $\gamma$ be a closed curve and $R_0$ be the unbounded region determined by $\gamma$. Then if $a \in R_0$, $n(\gamma, a) = 0$.

This provides a stronger result than *Lemma 3*. We don’t need $a$ to be outside a circle enclosing $\gamma$. As long as there’s a path from $\infty$ to $a$ not crossing $\gamma$, the winding number of $a$ is 0, for example the one in *Figure 3*.

It’s still not a necessary condition for the winding number to be 0 though, since in *Figure 2* there still no way out but the winding number is still 0.

So far we’ve been consider conditions that lead to $n(\gamma, a) = 0$. We now consider conditions that lead to $n(\gamma, a) = 1$.

To simplify calculations, we’ll assume $a$ is at the origin. If we take the geometric interpretation of winding number, we can see it’s invariant with translation. In fact, in the equation $(2)$, the expression $z - a$ is essentially doing this normalization.

We also assume $a$ is surrounded by the curve $\gamma$, otherwise we already know its winding number is 0. Then, visualizing this on the complex plane, $\gamma$ has to exist in all four quadrants since it surrounds the origin, such as the curve in *Figure 4*.

We can now state a sufficient condition for $n(\gamma, a) = 1$:

**Lemma 6.** Let $\gamma$ be a curve around the origin. We can pick points $z_1$ and $z_2$ such that $z_1$ has a positive imaginary component and $z_2$ a negative one. Denote the part of the curve from $z_1$ to $z_2$ as $\gamma_1$ and from $z_2$ to $z_1$ as $\gamma_2$.

If $\gamma_1$ doesn’t cross the positive real axis and $\gamma_2$ doesn’t cross the negative real axis, then $n(\gamma, a) = 1$, where $a$ is the origin.

We now build two closed curves: $\sigma_1 = \gamma_1 + \delta_2 - C_1 - \delta_1$ and $\sigma_2 = \gamma_2 + \delta_1 - C_2 - \delta_2$. $\sigma_2$ is shown in

In the example of *Figure 4*, *Lemma 6* says that because the blue path doesn’t cross the positive $x$-axis and the red path doesn’t cross the negative one, then $n(\gamma, a) = 1$.

To recap, we started by analyzing an example function for which we can’t use *Theorem 4* in [4] and showed that it has a nice geometric interpretation, the number of revolutions around a point.

We considered some properties such as those sufficient for $n(\gamma, a) = 0$ and $n(\gamma, a) = 1$. The winding number is not just a geometric curiosity though, it will be needed as we progress in our study of complex integration.

This is our third post in the series with my notes on complex integration, corresponding to *Chapter 4* in Ahlfors’ Complex Analysis.

The Cauchy integral theorem provides conditions under which the integral over a closed curve is zero.

The previous posts from the series:

In the previous post [3] we ended with the following *Corollary 2* stating that:

The complex line integral $\int_\gamma f(z)dz$, defined in $\Omega$, depends only on the endpoints of $\gamma$ if and only if $f$ is the derivative of some holomorphic function in $\Omega$.

Another corollary is the following:

**Corollary 1.** Let $f(z)$ be a function defined in $\Omega$. Then

If and only if $f$ is the derivative of some holomorphic function $F$ in $\Omega$.

The general idea of Cauchy’s theorem that we’ll cover in this post is that we only need $f$ itself to be holomorphic in $\Omega$, for special types of the region $\Omega$.

A result we haven’t proved yet says that the derivative of a holomorphic function is itself holomorphic, but not all holomorphic functions are derivatives of of a holomorphic function. So Cauchy’s theorem is a stronger result.

We first consider the case where $\Omega$ is a rectangle $R$ defined by

\[a \le x \le b, c \le y \le d\]The curve we’ll use is the border of $R$, denoted by $\partial R$, with a clock-wise orientation as depicted in *Figure 1*. In other words, it’s the segments $(a, c) \rightarrow (b, c)$, $(b, c) \rightarrow (b, d)$, $(b, d) \rightarrow (a, d)$ and $(a, d) \rightarrow (a, c)$.

**Theorem 1.** If the function $f(z)$ is holomorphic in $R$, then

So far we have $\abs{z - z^*} \lt \delta$, but we can find a tigher upper bound. The maximum distance between two points in a rectangle is its diagonal. Let $\Delta$ be the diagonal of $R$. Every time we pick a subrectangle the diagonal is halved, so the diagonal of $R_n$ is $2^{-n} \Delta$. Thus if $z$ and $z^*$ are in $R_n$, $\abs{a - z^*} \le 2^{-n} \Delta$. We have then: $$(1.5) \quad \abs{f(z) - f(z^*) + f'(z*) (z - z*)} \lt \epsilon \abs{z - z*} \le \epsilon 2^{-n} \Delta$$ In [2] we saw in the last example that for any curve $\gamma$, $$\int_\gamma (z - a)^{n}dz = 0$$ For $n \gt 0$. This let's us conclude the following: $$(1.6) \quad \int_{\partial R_n} (z - z^*) dz = 0$$ By replacing $a$ with $z^*$, $\gamma$ with $\partial R$, and setting $n = 1$. Similarly, if we do it for $n = 0$: $$(1.7) \quad \int_{\partial R_n} dz = 0$$ Since $-f(z*)$ exists, we can multiply it by $(1.7)$ and still get a 0: $$(1.8) \quad -f(z*) \int_{\partial R_n} dz = 0$$ Similarly, $f'(z*)$ exists and we can multiple it by $(1.6)$ and still get a 0: $$(1.9) \quad f'(z*) \int_{\partial R_n} (z - z^*) dz = 0$$ Since $(1.8)$ and $(1.9)$ are zero we can add them to $\eta(R_n)$ and obtain [4]: $$\eta(R_n) = \int_{\partial R_n} f(z)dz = \int_{\partial R_n} f(z)dz - f(z*) \int_{\partial R_n} dz + \int_f'(z*) {\partial R_n} (z - z^*) dz$$ Moving them under one integral (we can also move $f(z*)$ and $f'(z*)$ inside since they're constants with respect to $z \in R_n$): $$\eta(R_n) = \int_{\partial R_n} f(z) - f(z*) + f'(z*) (z - z^*) dz$$ All this trickery so that we get to the form of the inequality $(1.5)$. However, that inequality is with respect to the modulus, so we can use

The proof is very clever but I don’t have a good intuition on why it works. Anyway, we can generalize the theorem a bit by allowing points in $R$ for which $f(z)$ isn’t holomorphic:

**Theorem 2.** Let $f(z)$ be holomorphic in $R’$, obtained from the rectangle $R$ by removing a finite set of points $\xi_j$. Then if

for all $j$, then

\[(1) \quad \int_{\partial R} f(z) dz = 0\]This lets us reduce the theorem to the case where we remove exactly one point $\xi$ from $R$ (the case with zero points is

Using the same argument, we can subdivide a rectangle $R'$ containing $\xi$ such that the subrectangle containing $\xi$, denoted by $R_0$, is:

- A square of size $L$
- $\xi$ lies on its center
- L is infinitesimally small
- Satisfying: $$(2.1) \quad \int_{\partial R'} f(z)dz = \int_{\partial R_0} f(z)dz$$

So we’re saying that if $f(z)$ is not holomorphic at specific points in $R$ but it tends to 0 there, integrating over its boundary is still yields 0.

We now consider the case where $\Omega$ is the open circle $\abs{z - z_0} \lt \rho$, which we’ll denote by $\Delta$. We have the following result:

**Theorem 3.** If $f(z)$ is holomorphic in $\Delta$, then

for any closed curve $\gamma$ in $\Delta$.

We'll keep $(x_0, y_0)$ fixed and assume $(x', y')$ is variable. We can then define a function of $x'$ and $y'$: $$F(x', y') = \int_{\sigma} f(z) dz$$ Note that $\sigma$ is implicitly a function of $x'$ and $y'$ since its endpoint is $(x', y')$. And as discussed in [3] (section

One question that came to mind when trying to deal with these corner cases: can't we simply choose a different starting point for these cases? I believe the answer to be no. Then we wouldn't be able to treat $x_0$ and $y_0$ as constants and they would be a function of $x$ and $y$.

Note that it's fine for the curve $\sigma$ to be a function of $x$ and $y$ (as it is in fact) since we don't make any assumption about its constancy in $(3.1)$.

In the same way we generalized *Theorem 1.* to allow for points in the rectangle where we allow $f$ to be non-holomorphic, we can generalize *Theorem 3.*

**Theorem 4.** Let $f(z)$ be holomorphic in the region $\Delta’$ obtained by omitting a finite number of points $\xi_j$ from the open disk $\Delta$. If $f(z)$ is such that

then

\[\int_{\gamma} f(z) dz = 0\]for any closed curve $\gamma$ in $\Delta’$.

The first difference is that equation $(3.1)$ will now look like: $$F(x', y') = \int_{x_0}^{x'} f(x, y_1) dx + i \paren{\int_{y_0}^{y_1} f(x_0, y) dy + \int_{y_1}^{y'} f(x', y) dy}$$ To determine $\frac{\partial F}{\partial y}(x', y')$, we'll need to compute $F(x', y' + h)$, with $h \rightarrow 0$. We can use the exact same curve as we did for $F(x', y')$ except that the last segment will now go to $y' + h$, that is: $$F(x', y') = \int_{x_0}^{x'} f(x, y_1) dx + i \paren{\int_{y_0}^{y_1} f(x_0, y) dy + \int_{y_1}^{y' + h} f(x', y) dy}$$ By considering the limit $(3.2)$: $$\frac{\partial F}{\partial y}(x', y') = \lim_{h \rightarrow 0} \frac{F(x', y' + h) - F(x', y')}{h}$$ We'll arrive at the same conclusion that $$\frac{\partial F}{\partial y}(x', y') = i f(x', y')$$ If we call the blue segments $\sigma$ and the red ones $\overline{\sigma}$, we'll still conclude that $$(4.1) \quad F(x', y') = \int_{\sigma} f(z) dz = \int_{-\overline{\sigma}} f(z) dz$$ Because the 2 rectangles cancel out. Computing $\partial F/\partial x (x', y')$ is trickier however. That's because when we consider the point $(x' \pm h, y')$, the curve utilized is the 2-segment one because the path to $(x' \pm h, y')$ doesn't contain a $\xi_j$. This is illustrated in

So when we consider the difference $F(x' + h, y') - F(x', y')$ we need to be careful. The trick is to use rectangles to reduce the differences. Taking the example for

It took me a long time to figure out the proof of *Theorem 4*. In [1], Ahlfors provides almost no details besides *Figure 4.1* included in the proof of *Theorem 4*.

On my first read of the book I thought I had understood the proof but once I tried to plug it into the definition of derivative as a limit, I realized I didn’t understand it properly.

This is our second post in the series with my notes on complex integration, corresponding to *Chapter 4* in Ahlfors’ Complex Analysis.

In this installment, we’ll cover line integrals that are path-independent, in other words, that depend only on the first and last point of the paths, but not the specific path or curve chosen. We’ll focus on the conditions these integrals have to satisfy for this to hold.

We’ll start with line integrals for the Euclidean plane and then follow with complex line integrals.

The previous posts from the series:

Let $\gamma$ be a path in $\mathbb{R}^2$ and $f: \mathbb{R}^2 \rightarrow \mathbb{R}^2$. We can “split” $f(x, y)$ into two functions, one for each of the dimensions on its domain, i.e., $f(x, y) = (p(x, y), q(x, y))$.

We can define the line integral of the second kind (or second type) as:

\[(1) \quad \int_{\gamma} pdx + qdy\]Where $dx$ and $dy$ are the differentials along the $x$-axis and $y$-axis dimensions, respectively. We can also say that the integrand $pdx + qdy$ is in *differential form*.

Note that the image of this integral is $\mathbb{R}$.

To get an intuition about this notation, we can look into physics. A function $\mathbb{R}^n \rightarrow \mathbb{R}^{n}$ can be interpreted as a vector field: it associates a vector with a point in space. One classic example is wind speed (Figure 1), where at each point in the plane, we can have a vector denoting the direction and intensity of the wind.

Another example is the force field (e.g. gravitational or magnetic), denoted by $\overrightarrow{F}$. Work on the other hand is scalar ($\mathbb{R}$) and is defined as the dot product between the force vector and displacement, so if we wish to compute work performed on a curve $\gamma$ and it’s commonly expressed as:

\[\int_\gamma \overrightarrow{F} \cdot \overrightarrow{dr}\]Here, $\cdot$ denotes the dot product and $\overrightarrow{dr}$ an infinitesimal displacement along $\gamma$, so if we breakdown $\overrightarrow{F}$ and $\overrightarrow{dr}$ into its components, we have:

\[\int_\gamma (F_x, F_y) \cdot (dx, dy)\]Applying the definition of dot product:

\[\int_\gamma F_x dx + F_ydy\]Gives us the form $(1)$. We can interpret $F_x(x, y)$ as the length of the $x$ component of the vector $F$ at point $(x, y)$, and similarly for $F_y(x, y)$.

We can also write $(1)$ in terms of a single parameter $t$ if consider the parametric form of $\gamma$ as a function of a scalar $t \in [a, b]$. We then have functions describing the $x$ and $y$ values, $x(t)$ and $y(t)$ respectively.

With a change of variable we obtain:

\[(2) \quad \int_{\gamma} pdx + qdy = \int_{a}^{b} (p x'(t) + q y'(t))dt\]Where in the second form, $p$ and $q$ are shorthands for $p(x(t), y(t))$ and $q(x(t), y(t))$ and, to be super clear, $x’(t) = dx(t)/dt, y’(t) = dy(t)/dt$.

As the name suggests a path-independent integral is an integral that only depends on its endpoints, not on the specific path over which it’s integrated.

*Theorem 1* provides a sufficient and necessary condition for an integral to be path-independent.

**Theorem 1.** The line integral $\int_\gamma pdx + qdy$, defined in a region $\Omega$, depends only on the endpoints of $\gamma$ if and only if there exists $U(x, y): \Omega \rightarrow \mathbb{R}$ such that $\partial U / \partial x = p, \partial U / \partial y = q$.

Now we consider the other direction, assuming $\int_\gamma pdx + qdy$ only depends on the endpoints a curve $\gamma$ in $\Omega$, which means that for any two endpoints we are free to choose whatever curve we fancy as long as it lies in $\Omega$. We fix the starting point at $(x_0, y_0)$ and let the other endpoint by any point $(x, y) \in \Omega$.

We can define a curve $\gamma$ from $(x_0, y_0)$ to $(x, y)$ composed of segments parallel to either $x$-axis or $y$-axis (example in Figure 1.1) and define $U(x, y)$ as: $$(1.1) \quad U(x, y) = \int_{\gamma} pdx + qdy$$ Now, we pick a point $(x_1, y) \in \Omega$ with fixed $x_1$ and have $\gamma$ go through it, so the last segment is horizontal, i.e. connecting $(x_1, y)$ and $(x, y)$. We can split the curve $\gamma$ in two, one from $(x_0, y_0)$ to $(x_1, y)$, $\gamma_1$, and another from $(x_1, y)$ to $(x, y)$, $\gamma_2$.

Now $(1.1)$ can be written as: $$U(x, y) = \int_{\gamma_1} pdx + qdy + \int_{\gamma_1} pdx + qdy$$ The value of first integral only depends on $y$, not on $x$ because we fixed $x_1$, so we can call it $c(y)$. In the second we have a horizontal segment so $y$ is constant and thus $dy = 0$, so we have: $$U(x, y) = c(y) + \int_{x_1}^{x} pdx$$ If we define $P(x, y)$ such that $p = \partial P/\partial x$, then we have: $$U(x, y) = c(y) + P(x, y) - P(x_1, y)$$ Now if we take the partial derivative of $U(x, y)$ with respect to $x$, the terms $c(y)$ and $P(x_1, y)$ go to 0 since they're not functions of $x$, and $P(x, y)$ becomes $p$, so: $$\frac{\partial U}{\partial x} = p$$ We can use an analogous argument by choosing another curve such that the last segment is vertical and show that $$\frac{\partial U}{\partial y} = q$$ as well.

Let $U(x, y)$ be a function $U: \mathbb{R}^2 \rightarrow \mathbb{R}^2$. Then we can use the chain rule to obtain:

\[\frac{dU}{dt} = \frac{\partial U}{\partial x} \frac{dx}{dt} + \frac{\partial U}{\partial y} \frac{dy}{dt}\]In particular, we have the differential $dU$ as:

\[dU = \frac{\partial U}{\partial x} dx + \frac{\partial U}{\partial y} dy\]If $U$ satisfies *Theorem 1*, then we have:

Recall that the integrand $p dx + q dy$ is said to be in differential form. If *Theorem 1* is satisfied, it’s exactly the same as the differential of $U$, so we can call it an **exact differential** form.

So far we’ve been working with $\mathbb{R}^2$. Let’s now return to the complex world. Recall the contour integral (see *Contour Integral* in [2]) over a curve $\gamma$:

Since $dz$ is a complex number, we can consider its real and imaginary part $dz = dx + i dy$. Replacing in the above gives us:

\[= \int_\gamma f(z)dx + if(z)dy\]As usual, we can think of $f(z)$ as a function of two real variables and also define $p(x, y) = f(z)$ and $q(x, y) = if(x, y)$ and we get a differential form of $(1)$. The major difference is that the image of these functions is not $\mathbb{R}$ but $\mathbb{C}$.

We can still use *Theorem 1* since it doesn’t depend on the type of the image of the integral. We’ll denote $U(x, y)$ from the theorem as $F(z)$, so if

and

\[(5) \quad \frac{\partial F}{\partial y} = q = if(z)\]we can multiply $(5)$ by $i$ and add it to $(4)$:

\[\frac{\partial F}{\partial x} = - i \frac{\partial F}{\partial y}\]which is another form of the Cauchy-Riemman equations [3]! From *Theorem 1* in [3], we conlude that $F$ is a holomorphic function. Further we have from $(3)$:

or

\[\frac{dF}{dz} = f\]that is, $f$ is the derivative of $F$. We can thus re-state *Theorem 1* as a corollary:

**Corollary 2.** The complex line integral $\int_\gamma f(z)dz$, defined in $\Omega$, depends only on the endpoints of $\gamma$ if and only if $f$ is the derivative of some holomorphic function in $\Omega$.

A result that will be useful later is the integral

\[(6) \quad \int_\gamma (z - a)^{n} dz\]For a closed curve $\gamma$, $n \in \mathbb{Z}$ and a constant $a \in \mathbb{C}$. We have that $(z - a)^{n}$ is the derivative of

\[(7) \quad (z - a)^{n+1}/(n + 1)\]If $n \ge 0$, then $(7)$ is holomorphic everywhere (or entire [3]), thus we can use *Corollary 2* to claim $(6)$ only depends on its endpoints and since it’s a closed curve, they coincide and thus $(6)$ is 0.

If $n \lt 1$, then $(7)$ is holomorphic only if $z \ne a$, so as long as $\gamma$ doesn’t go through $a$, $(6)$ is still 0. For $n = -1$, we can’t claim $(6)$ is always 0. An example provided in [1] is

\[(8) \quad \int_C (z - a)^{-1} dz\]Where $C$ is a circle of radius $\rho$ centered in $a$. We can thus write $z = a + \rho e^{it}$, $0 \le t \le 2\pi$, so $dz = i \rho e^{it} dt$. Replacing in $(8)$:

\[= \int_0^{2\pi} \frac{i \rho e^{it}}{a + \rho e^{it} - a} dt = \int_0^{2\pi} idt = 2\pi i\]This post inaugurates a series with my notes on complex integration, corresponding to *Chapter 4* in Ahlfors’ Complex Analysis.

In this post we’ll cover different definitions of complex integrals - integrals of functions whose domain and image are complex numbers - and some basic properties.

To recap, in [2] we defined *complex derivatives* of a function $f: \Omega \rightarrow \mathbb{C}$, for $\Omega \in \mathbb{C}$, denoted by $f’(z)$, as:

It looks very similar to a real derivative except that we’re dealing with complex numbers, which makes the concept $h \rightarrow 0$ more nuanced.

How about the complex integral? Consider first the real-integral such as

\[(1) \qquad \int_a^b f(x) dx\]We can think of it as the sum of $f(x)$ for infinitesimal sub-intervals $dx$ over the a segment of the real-line. For complex numbers we can generalize it by letting it be any curve on the complex plane, in particular *parametric curves* as defined in [3] as we see next.

Consider the function $f(t): [a, b] \rightarrow \mathbb{C}$ where $t$ is from a real-number $a \le t \le b$. The line integral is defined as:

\[(2) \qquad \int_{a}^{b} f(t) dt\]This is very similar to the real integral $(1)$ with the exception that the image of $f(t)$ is complex (but not its domain). If we decompose it into its real and imaginary part, say, $f(t) = u(t) + iv(t)$ we can use the linearity of real integrals to obtain:

\[(3) \qquad \int_{a}^{b} f(t) dt = \int_{a}^{b} u(t) dt + i \int_{a}^{b} v(t) dt\]In other words we can define a line integral through the original real integral $(1)$. Using the linearity principle again, we can show that the complex line integral is also linear:

\[(4) \qquad \int_{a}^{b} cf(t) dt = c \int_{a}^{b} f(t) dt\]For a complex constant $c = \alpha + i \beta$.

Recall that the triangle inequality which states that $\abs{u + v} \le \abs{u} + \abs{v}$ for complex $u$ and $v$. Using (4) we can generalize this property to the complex line integral, as stated in *Theorem 1*.

**Theorem 1.**

Instead of defining over a real interval, we can define it over a curve or arc $\gamma$, as long as it is piecewise differentiable and that $f(z)$ is continuous over $z \in \gamma$:

\[(5) \quad \int_{\gamma} f(z) dz = \int_{a}^{b} f(g(t)) g'(t) dt\]Here we’re using variable substition $z = g(t)$ to define the line integral over a curve in terms of $(1.1)$.

We can also subdivide $\gamma$ into sub-curves $\gamma = \gamma_1 + \gamma_2 + \cdots + \gamma_n$ in which case the integral over it can be expressed as the sum of the integrals of its parts:

\[\int_{\gamma} f(z) dz = \int_{\gamma_1} f(z) dz + \int_{\gamma_2} f(z) dz + \cdots + \int_{\gamma_n} f(z) dz\]We can introduce the following notation:

\[(6) \quad \int_{\gamma} f(z) \overline{dz} = \overline{\int_{\gamma} \overline{f(z)} dz}\]Recall that we can think of a function of $C$ as one taking two real values, the real and imaginary part of $z$, so $f(z) = f(x, y)$. We can integrate over only $x$ and obtain a function of $y$. That is,

\[(7) \quad g(y) = \int_{\gamma} f(x, y) dx\]Analogously, we have:

\[(8) \quad h(x) = \int_{\gamma} f(x, y) dy\]With this notation, we can express $(7)$ and $(8)$ via $(5)$ and $(6)$:

\[(9) \quad \int_{\gamma} f(x, y) dx = \frac{1}{2}\paren{\int_{\gamma} f(z) dz + \int_{\gamma} f(z) \overline{dz}}\]and

\[(10) \quad \int_{\gamma} f(x, y) dy = \frac{1}{2i}\paren{\int_{\gamma} f(z) dz - \int_{\gamma} f(z) \overline{dz}}\]which follows from the identities: $x = (z + \overline{z})/2$ and $y = (z - \overline{z})/(2i)$. If we multiply $(10)$ by $i$ and add with $(9)$, we can express $\int_{\gamma} f(z) dz$ as a function of its “partial integrals”:

\[\int_{\gamma} f(z) dz = \int_{\gamma} f(x, y) dx + i \int_{\gamma} f(x, y) dy\]If we split the real and imaginary parts of the result of $f$, i.e. $f(z) = u(z) + iv(z)$, we can then obtain:

\[\int_{\gamma} f(z) dz = \int_{\gamma} (u(x, y) + iv(x, y))dx + i \int_{\gamma} (u(x, y) + iv(x, y))dy\]Grouping terms into real and imaginary parts:

\[\int_{\gamma} f(z) dz = \int_{\gamma} (u(x, y)dx - v(x, y)dy) + i \int_{\gamma} (u(x, y)dy + v(x, y)dx)\]Or using a more succinct notation (ommitting the parameters that can be inferred from context):

\[\int_{\gamma} f dz = \int_{\gamma} (udx - vdy) + i \int_{\gamma} (udy + vdx)\]We introduce yet another definition and notation, where we use the length of the differential and is defined as follows:

\[(11) \quad \int_{\gamma} f(z)\abs{dz} = \int_{a}^{b} f(g(t)) \abs{g'(t)} dt\]Which is similar to $(6)$ except that we multiply by the real $\abs{g’(t)}$ instead of the complex $g’(t)$. An analogous result of *Theorem 1* for arc-length integral is *Theorem 2*:

**Theorem 2.**

Which is nice because we can work with the contour integral notation instead of the parametric curve one.

**Curve length.** If $f(z) = 1$, then $(11)$ reduces to

and it corresponds to the length of $\gamma$. As an example, if $\gamma$ is a circle of radius $\rho$ centered at the origin, we can define the parametric curve $g(t) = \rho e^{it}$ for $0 \le t < 2 \pi$.

We have $g’(t) = g(t) = \rho e^{it}$ and that $\abs(\rho e^{it}) = \rho$ and that $f(g(t)) = 1$. Plugging this into $(11)$ gives us:

\[\int_{\gamma} \abs{dz} = \int_{a}^{b} \rho dt = 2\pi \rho\]In this post, we’ll delve into the function `std::call_once()`

in the C++ STL: why it’s useful, how efficient it is and how it can be implemented. We’ll provide an simple implementation based on locks and a more advanced one based on futexes.

Finally we do a benchmark to compare their performance vs. libraries such as GCC, Clang and Folly.

I recently needed to call a function that computes some result, memoizes it, so that in subsequent calls it gets the memoized result. It went something like this:

However, during code review it was pointed out this was not thread-safe. After some searching (a.k.a. asking ChatGPT4) I learned about the `std::call_once()`

function. It takes a `std::once_flag`

and a lambda that is guaranteed to be called once even in a multi-threaded environment.

The above example would look like:

Here we don’t need to make `result`

optional because we don’t need to determine whether it has been initialized.

If a thread calls `std::call_once()`

with the same `flag`

while another thread is executing the lambda, it will block until `result`

the lambda is finished and hence `result`

is properly set.

If a thread calls `std::call_once()`

with the same `flag`

after the lambda has been executed, `std::call_once()`

will return right away.

I was curious how to implement `std::call_once`

(and `std::once_flag`

) and in [2] user Matteo Italia provided a nice solution.

I’m reproducing it in here with some cleanups to improve readability. The `std::once_flag`

class can be implemented as:

Notice that most of it is boilerplate. The main takeaway is that `std::once_flag`

is essentially composed of a mutex and an atomic boolean.

Now to the `call_once()`

implementation:

In `(1)`

we check whether the flag `has_run`

has been set. This is thread-safe because `has_run`

is atomic. If not, then we acquire a lock to perform the computation `(2)`

.

It’s possible that by the time we’re done acquiring the lock, another thread already invoked `f()`

and set `flag.has_run`

, so we need to check it one more time in `(3)`

.

If it’s still false, we’re guaranteed no other thread will compute it until we release the lock. Since it’s a RAII lock, this will only happen once `call_once()`

ends. Worth noting that the implementation in [2] uses a `std::unique_lock`

without explicitly locking it, and the author mentions in a comment it has RAII semantics so I believe they meant to use `std::lock_guard`

as we did here.

If `f()`

throws an exception, according to [1]:

If that invocation throws an exception, it is propagated to the caller of

`std::call_once()`

, and flag is not flipped so that another call will be attempted.

So in this case it should work as intended. If `f()`

throws, we skip setting `flag.has_run`

and since we exit the `call_once()`

, the lock is released. If there’s another thread waiting to acquire the lock, it will manage to do it and retry `f()`

.

This code only requires locking until a successful execution of `f()`

and the setting of `flag.has_run`

. Once that happens, it consists of checking an atomic boolean variable which is most systems should be implemented without locks.

There’s a worst case scenario where many threads will invoke `std::call_once()`

and if the function takes long enough, all but one thread will block when trying to acquire the lock. Once the thread executing the function finishes and unlocks, each of the remaining threads will have to acquire the lock, only to execute `(3)`

and realize it doesn’t need to execute `f()`

. Since only one thread can get the lock at any time, this process will repeat until the last thread exits.

In [2] Matteo mentions the need to make the constructor of `once_flag`

`constexpr`

and points to a Boost discussion [6] that says non-`constexpr`

constructors are not thread-safe.

I get this fact, but I couldn’t find an example where the thread safety of the constructor matters, in particular when the copy constructor is deleted as is the case with `once_flag`

.

The version suggested by Matteo is very close to `folly::call_once()`

[4]. Some notable details from folly:

- It annotates the check
`(1)`

with the equivalent of GCC’s`__builtin_expect`

to hint to the compiler that this is a very likely branch to be taken. - It uses
`std::memory_order_relaxed`

when reading the atomic boolean and`std::std::memory_order_release`

when writing the atomic boolean. These flags control the memory consistency. By default reads and writes to atomic variables use`std::memory_order_seq_cst`

which is the safest but less efficient level. It’s possible to relax the constraints when you know how about the relationship of reads and writes. This is a complicated subject I hope to write about one day.

The GCC libstdc++-v3 uses a single global mutex across all calls to `std::call_once()`

.

In a private forum I saw someone mentioned `std::call_once()`

is a good application for futexes. What is a futex? Eli Bendersky’s blog provides a very good introduction [5], but essentially futex stands for *Fast userspace mutex* and it’s a lower level API that the STL uses to implement things such as `std::mutex`

.

The key behavior which makes it good for `std::call_once()`

is that all threads are awoken at once by the kernel, so they can all do the flag check without having to acquire locks sequentially. This helps with the worst case scenario mentioned above in *Efficiency*.

However, this API is only available as a Linux kernel system call, so this solution is not portable. Instead we can use an abstraction for Futexes such as Folly’s Futex. Which also has a simpler API for waiting and waking than the one from Linux.

It defines a `Futex`

class, which is essentially an atomic unsigned 32-int variable. The waiting API is `futexWait()`

:

It will block the thread:

**Case 1.**If the value in`futex`

is not`expectedValue`

, in which case`FutexResult::VALUE_CHANGED`

is returned immediately.**Case 2.**When another thread calls`futexWake()`

(see next), in which case`FutexResult::AWOKEN`

is returned.

The API for `futexWake()`

is:

We can specify the number of threads to awake. If we want to awake all threads, we can just set `numberToAwake = INT_MAX`

.

Using folly’s `Futex`

, our `call_once()`

code can be as follows:

Notice that the `once_flag`

is now a simple alias to the `Futex`

(which in turn is an alias to an atomic integer), with the additional need to explicitly initialize it to 0:

We can do a small optimization: instead of always calling `std::atomic_compare_exchange_strong()`

, we can read from the flag first, because once the function is computed, it will always return true:

I also tried using `__builtin_expect`

on that call and more relaxed memory models (e.g. `std::memory_relaxed`

) but didn’t see significant performance gains on the benchmark (see next).

I was curious to see how these different implementations compare. I extended Folly’s benchmark for `call_once`

comparing the STL and Folly’s implementations and the futex-based and lock-based implementations provided here.

The benchmark is very simple: it basically starts `N`

threads and have them attempt to call a function wrapped via `call_once`

:

I was wondering if the sequential creation of threads might stagger the execution of `fn()`

above and reduce contention. I tried adding a barrier (`std::latch`

) right before the inner `for`

-loop to exarcebate the concurrent access to the exclusive region but it didn’t seem to have a visible effect.

I also tried sleeping for 1 second in the function wrapped in `call_once`

to make sure the first thread is not done computing before others attempt it. The *relative* performance results didn’t seem to change.

On MacOS, I pulled the head of Folly’s repo as of 2024/02/16 (commit `65fb952918572592fa7dd2478f3b582b26e66b3f`

) and compiled with Clang 15 (`-O3`

), running on a MacBook M1 Pro. I used 100,000,000 iterations and 32 threads. The results are as follows:

Implementation | Time / Iter | Iter / Sec |
---|---|---|

`StdCallOnceBench` |
`2.92ns` |
`341.93M` |

`FollyCallOnceBench` |
`2.91ns` |
`324.35M` |

`FutexCallOnceBench` |
`2.39ns` |
`418.23M` |

`LockCallOnceBench` |
`2.56ns` |
`390.23M` |

So they seem to have pretty comparable performance when accounting for noise and variance, though the futex-based one is slightly faster.

I also tried it on a Ubuntu. Pulling the head of Folly’s repo as of 2024/02/24 (commit `ff3463a6b459a4046d2bef3b231e32c8a3265d0e`

) and compiled with GCC 9.4 (`-O3`

), running on a Intel i5-8400 2.80GHz. The results are as follows:

Implementation | Time / Iter | Iter / Sec |
---|---|---|

`StdCallOnceBench` |
`9.64ns` |
`103.77M` |

`FollyCallOnceBench` |
`5.39ns` |
`185.44M` |

`FutexCallOnceBench` |
`4.00ns` |
`249.82M` |

`LockCallOnceBench` |
`5.41ns` |
`184.82M` |

The lock-based implementation is pretty close to Folly’s and the futex-based was a bit faster. However, I experienced enough variance depending on the setup that I’m not confident in claming which one is faster. All versions are significantly faster than GCC though!

I’m surprised the lock-based implementation performed so well, since it’s the simplest and without optimizations.

The optimization mentioned at the end of *Implementation with Futexes* was very important. Without it the futex implementation was 100x worse than the others.

I love digging into topics and learning a lot more details than I expected! Initially I just wanted to understand the performance of `std::call_once()`

but ended up learning about `constexpr`

constructors and futexes in the process.

I followed the instructions on Folly’s `README.md`

to install the dependencies. It required some work to figure out the compilation commands, without setting up a build system like CMake.

For MacOS:

For Linux (Ubuntu):

Running:

In this post we’ll discuss the bipolar coordinate system and how it can be used to simplify Möbius transformations. It puts together a bunch of concepts we studied in prior posts, so being familiar with the series is helpful.

This is the last post of the series based on holomorphic functions, which correspond to Chapter 3 (*Analytic Functions as Mappings*) in Ahlfors’ Complex Analysis [1].

- Holomorphic Functions
- Conformal Maps
- Möbius Transformation
- Cross-Ratio
- Circles of Apollonius
- Symmetry Points of a Circle

In particular, we’ll need the circles of Apollonius, symmetry points of a circle and, of course, the Möbius transformation.

Recall from [2] that, given two points $a$ and $b$ and a ratio $q$, the circle of Apollonius is the set of points $z$ satisfying:

\[\frac{\abs{z - a}}{\abs{z - b}} = q\]We call $a$ and $b$ the *foci* and each $q$ defines a different circle.

The collection of circle of Apollonius (for all possible $q$) and all circles through $a$ and $b$ form the **bipolar Coordinate** system induced by $a$ and $b$. Each point on the complex plane except $a$ and $b$ is the intersection of exactly one circle of Apollonius and one circle through $a$ and $b$ (see *Figure 1*, left).

That means that points can be uniquely identified by their corresponding Apollonius and one circle through $a$ and $b$, and so this forms a coordinate system, known as the **bipolar coordinate**.

Why bother with such a convoluted way to identify the position of points? In the same way that polar coordinates can be make some problems simpler, we can use the bipolar coordinates to simplify things. For this post in particular we’ll focus on its application for Möbius transformations.

First we connect the bipolar system, in particular the circle of Apollonius, with symmetry points of a circle:

**Theorem 1.** Let $C$ be the circle of Apollonius corresponding to foci $a$ and $b$ and ratio $q$. Points $a$ and $b$ form a pair of symmetric points with respect to $C$.

We want to show that the identity of cross-ratios $(a, f, u, v) = \overline{(b, f, u, v)}$ holds, and then

For ease of reasoning, we can assume that $a$ and $b$ lie on the real axis (and hence also $u$ and $v$). This should not be an issue since we can apply a Möbius transformation to achieve this result and cross-ratios are invariant with Möbius transformations. We have that: $$(1.1) \quad (a, p, u, v) = \frac{a - u}{a - v} \cdot \frac{p - u}{p - v}$$ and $$\overline{(b, p, u, v)} = \frac{\overline{b - u}}{\overline{b - v}} \cdot \frac{\overline{p - u}}{\overline{p - v}}$$ Since $b, u$ and $v$ are on the real line, they're equal to their conjugates, so $$ = \frac{b - u}{b - v} \cdot \frac{\overline{p} - u}{\overline{p} - v}$$ We first show that: $$(1.2) \quad \frac{a - u}{a - v} = - \frac{b - u}{b - v}$$ Since $u$ and $v$ are points on the Apollonius circle, they satisfy $\abs{a - u} = q \abs{b - u}$ and $\abs{a - v} = q \abs{b - v}$. And since $a, b, u$ and $v$ are collinear, we have $(a - u) = -q(b - u)$ and $(a - v) = q(b - v)$. Note that since $u$ is in between $a$ and $b$, $(a - u)$ and $(b - u)$ point in opposite directions, so we need the negative $q$. So $$ \frac{a - u}{a - v} = \frac{-q(b - u)}{q(b - v)} = - \frac{b - u}{b - v} $$ Now we show that: $$(1.3) \quad \frac{p - u}{p - v} = - \overline{\left( \frac{p - u}{p - v} \right)} = - \frac{\overline{p} - u}{\overline{p} - v}$$ We consider $(p - u)$ in polar form as $R_u e^{i\theta_u}$ and $(p - v)$ as $R_v e^{i\theta_v}$. We first claim that $\theta_u - \theta_v = \pi/2$. This can be proven from the

We have that $\theta_u + 90^{\circ} = (180^{\circ} - \theta_v) = 180^{\circ}$ so $\theta_u - \theta_v = 90^{\circ} = \pi/2$. So: $$\frac{p - u}{p - v} = \frac{R_u}{R_v} e^{i (\theta_u - \theta_v)} = \frac{R_u}{R_v} e^{i \pi/2}$$ If we recall that if $z = R e^{i\theta}$ then $\overline{z} = R e^{-i\theta}$, the conjugate of the above is: $$\overline{\left( \frac{p - u}{p - v} \right)} = \frac{R_u}{R_v} e^{-i \pi/2}$$ We have $-\pi/2 = 2\pi - \pi/2 = \pi + \pi/2$ and that $e^{i\pi} = -1$ (Euler's identity!), so $$\frac{R_u}{R_v} e^{-i \pi/2} = \frac{R_u}{R_v} e^{i \pi} e^{i \pi/2} = -\frac{R_u}{R_v} e^{i \pi/2}$$ Which proves $(1.3)$. Multiplying $(1.2)$ and $(1.3)$ together gives us: $$\frac{a - u}{a - v} \cdot \frac{p - u}{p - v} = \frac{b - u}{b - v} \cdot \frac{\overline{p} - u}{\overline{p} - v}$$ Showing that $(a, p, u, v) = \overline{(b, p, u, v)}$. Which proves that $a$ and $b$ are symmetric with respect to $C$.

*Theorem 1* combined with *Theorem 9* in [2], tell us that any circles through $a$ and $b$ must then intersect $C$ orthogonally.

**Corollary 2.** Circles through $a$ and $b$ intersect Apollonious circles induced by $a$ and $b$ orthogonally.

In our post Möbius Transformations [3] we discussed the concept of fixed points. To recap, given a transformation $T$, a fixed point is such that $\gamma = T(\gamma)$. In the general case a Möbius transformation has two fixed points. It can have a single one, but we’ll ignore that case in this post.

Now let $a$ and $b$ be the fixed points of a Möbius transformation $T$ and consider the bipolar coordinates induced by these $a$ and $b$. What happens if we transform this system via:

\[(1) \quad z = U(w) = \frac{w - a}{w - b}\]It will send point $a$ to 0 and $b$ to $\infty$. What does it do to a Apollonius circle? *Theorem 3* answers that:

**Theorem 3**. Transformation $U(w)$ maps Apollonius circles to circles at the origin.

How about circles through $a$ and $b$? *Theorem 4* has the answer:

**Theorem 4**. Transformation $U(w)$ maps circles through $a$ and $b$ to lines through the origin (*radial lines*).

The angle of this line will be determined by any other point on the circle $C$.

Taken together, we see that transformation $U(w)$ maps a bipolar coordinate system into a polar one! A fixed Apollonius circle in the transformed plane denotes a fixed radius $r$ and a fixed circle through $a$ and $b$ in the transformed plane denotes a angle $\theta$. *Figure 1* illustrates this.

Another way to prove *Corollary 2* is by observing that the lines through the origin intersect circles through the origin orthogonally. Since Möbius transformations are conformal maps (*Theorem 3* in [3]), the corresponding curves in the bipolar coordinates must also intersect orthogonally.

Recall from our post Möbius Transformations [3] that a given Möbius Transformation $T$ with fixed points $a$ and $b$ has the following normal form:

\[\frac{f(z) - a}{f(z) - b} = r e^{i\theta} \frac{z - a}{z - b}\]Taking $U(w) = (w - a)/(w - b)$, we can rewrite it as:

\[w' = U^{-1} ( r e^{i\theta} U(w) )\]The bipolar coordinate system gives us a more intuitive interpretation of this equation. First we map a point $w$ from the coordinate system into the polar one via $U(w)$.

In there, we multiply it by a factor of $r$ (moving it to another circle through the origin) and a rotation by $\theta$ (moving it to another radial line).

Finally we transform the resulting point back to the bipolar coordinate system by applying $U^{-1}(z)$ to the transformed point. *Figure 2* illustrates this process:

In [1] Ahlfors calls the bipolar coordinate system a *circular net* or *Steiner circles* but I didn’t find this used on the web, except for [6].

After reading the corresponding chapter on this in [1], I didn’t fully grasp how this bipolar coordinate is useful in the context of Möbius transformations. Only after reading [5] and seeing diagrams (on which *Figure 2* is based) that I got the idea and it’s really elegant, so I’m glad I took the time to complement my studies.

- [1] Complex Analysis - Lars V. Ahlfors
- [2] NP-Incompleteness - Symmetry Points of a Circle
- [3] NP-Incompleteness - Möbius Transformation
- [4] Geometry with an Introduction to Cosmic Topology - 3.5: Möbius Transformations: A Closer Look.
- [5] Introduction to Groups and Geometries - David W. Lyons
- [6] NP-Incompleteness - Circles of Apollonius

In this post we study the *Symmetry Points of a Circle* in the context of Complex Analysis. It builds on concepts and results discussed in prior section, so it’s worth getting familiar with the series:

We can think of circle symmetry in analogy with conjugate points in the complex plane. For any given complex number $z$, with its conjugate $\overline{z}$ they form a pair of symmetric points with respect to the real line. Circle symmetry is similar, except the “line” of symmetry is the circumference of the circle.

Let $C$ be a circle and $T$ any Möbius transformation that maps the real line ($x$-axis in the Complex Plane) into $C$. Let $w = T(z)$ and $w^{*} = T(\overline{z})$. We then say $w$ and $w^{*}$ are **symmetric with respect to** $C$.

**Lemma 1.** For every circle $C$ there exists a Möbius transformation that maps the real line to it and vice-versa.

Let $C$ have center $z_0 = (x_0, y_0)$ and radius $r$. We'll first translate the circle so that its center lies at $z'_0 = (x'_0, y'_0) = (0, r)$, so if $\alpha = (-x_0, r - y_0)$, a translation $T(z) = z + \alpha$ will do. Now we use

Since Möbius transformations are composable, $VUT(z)$ is single Möbius transformation mapping the circle $C$ to the real line. We can put them together explicitly as: $$\frac{1}{z + \alpha} + \beta = \frac{\beta z + \beta \alpha}{z + \alpha}$$

The transformation mapping $w$ to $w^{*}$ is called a **reflection**. It can be obtained by applying $T^{-1}(w)$ to obtain $z$, then obtaining its conjugate $R(z) = \overline{z}$ and then applying $T(\overline{z})$ to obtain $w^{*}$, so $TRT^{-1}$. Note this is not a Möbius transformation because conjugation cannot be achieved via such transformations. *Theorem 3* provides an explicit formula for the reflection.

To make the discussions and proofs easier, we’ll introduce the terms $w$**-space** and $z$**-space**. The $w$-space is the one containing the circle $C$ and the $z$-space is the image of the transformation $T$, i.e. the one where $C$ becomes the real-line.

We can characterize symmetric points by their cross ratio, as stated by Theorem 2.

**Theorem 2** Let $C$ be a circle containing distinct points $w_1, w_2$ and $w_3$. Points $w$ and $w^{*}$ are symmetric with respect to $C$ if and only if $(w^{*}, w_1, w_2, w_3) = \overline{(w, w_1, w_2, w_3)}$.

Now we start with $(w^{*}, w_1, w_2, w_3) = \overline{(w, w_1, w_2, w_3)}$ and conclude from symmetric points $w$ and $w^{*}$ are symmetric. Let $T$ still be a Möbius transformation mapping points in $C$ to the real line. Again, using the fact that the cross ratio is invariant to Möbius transformations, we'll arrive at: $$(z^{*}, z_1, z_2, z_3) = \overline{(z, z_1, z_2, z_3)}$$ Replacing them in the definition of cross ratio: $$\frac{z^{*} - z_2}{z^{*} - z_3} \cdot \frac{z_1 - z_2}{z_1 - z_3} = \overline{\left(\frac{z - z_2}{z - z_3} \cdot \frac{z_1 - z_2}{z_1 - z_3}\right)}$$ And using conjugate identities: $$\frac{z^{*} - z_2}{z^{*} - z_3} \cdot \frac{z_1 - z_2}{z_1 - z_3} = \frac{\overline{z} - \overline{z_2}}{\overline{z} - \overline{z_3}} \cdot \frac{\overline{z_1} - \overline{z_2}}{\overline{z_1} - \overline{z_3}}$$ Recalling that $z_i$ is real (but $z$ not necessarily!), and thus equal to its conjugate: $$\frac{z^{*} - z_2}{z^{*} - z_3} \cdot \frac{z_1 - z_2}{z_1 - z_3} = \frac{\overline{z} - z_2}{\overline{z} - z_3} \cdot \frac{z_1 - z_2}{z_1 - z_3}$$ Cancelling terms leaves us with: $$\frac{z^{*} - z_2}{z^{*} - z_3} = \frac{\overline{z} - z_2}{\overline{z} - z_3} $$ or $$(z^{*} - z_2)(\overline{z} - z_3) = (\overline{z} - z_2)(z^{*} - z_3)$$ Distributing: $$z^{*}\overline{z} - \overline{z}z_2 - z^{*}z_3 + z_2z_3 = z^{*}\overline{z} - z^{*}z_2 - \overline{z}z_3 + z_2z_3$$ Cancelling terms and grouping by $\overline{z}$ and $z^{*}$: $$\overline{z}(z_3 - z_2) = z^{*}(z_3 - z_2)$$ Since $z_3 \ne z_2$, $z^{*} = \overline{z}$. Which means, by definition, $w$ and $w^{*}$ are symmetric points with respect to $C$.

In [1] Ahlfors actually uses *Theorem 2* as a *definition* for symmetric points. We can use this theorem to come up with an explicit formula for obtaining $w^{*}$ from $w$ as stated in *Theorem 3*:

**Theorem 3.** Let $C$ be a circle with center $a$ and radius $R$, and points $w$ and $w^{*}$ symmetric with respect to $C$. Then

We first start with a translation by $-a$: $$\overline{(w, w_1, w_2, w_3)} = \overline{(w - a, w_1 - a, w_2 - a, w_3 - a)}$$ We use

Suppose that $C$ is the unit circle centered on the origin. Then $(1)$ becomes:

\[w^{*} = \frac{1}{\overline{w}}\]If we consider the polar form of these points $w = r_1 e^{i \theta_1}$ and $w^{*} = r_2 e^{i \theta_2}$, and noting that $\overline{w} = r_1 e^{-i \theta_1}$ and $1/\overline{w} = 1/r_1 e^{i \theta_1}$, we’ll obtain:

\[r_2 e^{i \theta_2} = \frac{R^2}{r_1} e^{i \theta_1}\]So we have $\theta_1 = \theta_2$, which implies that the points $w^*$ and $w$ have the same angle with respect to the origin and are thus collinear with it. Since lines are preserved under translation, they continue to be collinear for a circle centered in $a$.

Further $r_1 r_2 = R^2$, that is, the distance of the symmetric points $w$ and $w^{*}$ to the center of the circle are inversely proportional. Without loss of generality, assume that $r_1 \le r_2$. From these observations we derive many corollaries.

Let $C$ be a circle of radius $R$ and center $a$, and $w$ and $w^{*}$ symmetric points with respect to $C$. Then:

**Corollary 4**. The points $w$ and $w^{*}$ are collinear with point $a$.

**Corollary 5**. The distance $w$ and $w^{*}$ from $a$, $r_1$ and $r_2$ respectively, are inversely proportional, in particular:

If we set $r_1 = R$, we’ll obtain $r_2 = R$ and since they’re collinear by *Corollary 4*, they must coincide. In other words, $w$ is symmetric with itself, or that:

**Corollary 6**. If $w$ is in $C$, then $w = w^{*}$.

By setting $r_1 = 0$ in $(2)$, we’ll obtain $r_2 = \infty$, leading to:

**Corollary 7**. If $w = a$, then $w^{*} = \infty$

If $w \not \in C$, since we assume $r_1 \le r_2$ and have that $r_1 r_2 = R^{2}$, it must be that $r_1 \lt R$, and that $r_2 \gt R$, so we have that:

**Corollary 8**. If $w \not \in C$, then $w$ and $w^{*}$ lie on opposite sides of $C$.

Another characterization of the point symmetry is given by [5], stated here as *Theorem 9*:

**Theorem 9.** Let $C$ be a circle. Points $w$ and $w^{*}$ are symmetric with respect to $C$ if and only if every line and circle through $w$ and $w^{*}$ that intersects $C$, do so orthogonally.

We also know, from

Now we assume one direction of the theorem, that $w$ and $w^{*}$ are symmetric with respect to $C$, so they get mapped, via $T$, to the conjugates $z$ and $\overline{z}$ in the $z$-space.

Since $z$ and $\overline{z}$ are symmetric with the real-line, it's possible to show that the points segment between $T(p_1)$ and $T(p_2)$ is a diameter of the circle $C'$, and that this circle intersects the real line perpendicularly. See

Now assume the other direction of the theorem, that we have 2 points $w$ and $w^{*}$ and that every circle or line through them that intersects $C$ does so orthogonaly. First we claim that $w$ and $w^{*}$ must be on opposite sides of $C$. If they're on the same side, there exists a cicle through them that is tangent to $C$ and hence doesn't intersect orthogonally.

The transformation $T$ will map the points $w$ and $w^{*}$ to points $z$ and $z^{*}$. We claim that $z$ and $z^{*}$ are on opposite sides of the real line. Otherwise there would exist a cicle through them that does not intersect the real line which would correspond to a circle or line through $w$ and $w^{*}$ in the $w$-space that does not intersect $C$, which would imply they're on the same side with respect to $C$, a contradiction.

Now that we know that $z$ and $z^{*}$ are on opposite sides of the real line, consider the line through them. It will intersect the real line at a point $p$. We claim that this line has to be perpendicular to the real line. Suppose it's not. Then the corresponding circle or line through $w$ and $w^{*}$ in the $w$-space will intersect $C$ (at the point $T^{-1}(p)$) in a non-perpendicular way (due to the conformal mapping property), which contradicts the hypothesis.

So we can assume $z$ and $z^{*}$ have the same $x$-value. It remains to show that $z$ and $z^{*}$ are equidistant from the real line and hence have opposite $y$-values. We notice that every circle through $z$ and $z^{*}$ will intersect the real line at two points $p_1$ and $p_2$, and it has to do so orthogonally. This means that $p_1p_2$ is a diameter of such circle and that the real line bisects it (see

An important result is that point symmetry is preserved under Möbius transformations, as stated in *Theorem 10*.

**Theorem 10.** (*Symmetry Principle*) If a Möbius transformation maps a circle $C_1$ to $C_2$, then it maps any pair of symmetric points with respect to $C_1$ into a pair that is symmetric with respect to $C_2$.

One interesting way to see equation $(1)$ is it being a bijection from the interior of the circle to the exterior. In fact, this looks a lot like what the Stereographic projection does [6]: it maps points in the Northern hemisphere of the Riemann sphere to the exterior of the unit circle on the complex plane and points on the Southern hemisphere to the interior.

There’s actually a nice connection with Stereographic projection. If we take a point $p = (x_1, x_2, x_3)$ on the Riemann sphere and its symmetric point $p’ = (x_1, x_2, -x_3)$ with respect to the plane, then their corresponding projections will form a pair of symmetric points with respect to the unit circle in the extended complex plane.

To see why, we can use the equation for the projection onto the complex plane [6] to find the complex numbers for $p$ and $p’$, which we denote (conveniently!) by $w$ and $w^{*}$:

\[w = \frac{x_1 + ix_2}{1 - x_3}, \qquad w^{*} = \frac{x_1 + ix_2}{1 + x_3}\]Their modulus is given by:

\[\abs{w} = \frac{\sqrt{x_1^2 + x_2^2}}{1 - x_3}, \qquad \abs{w^{*}} = \frac{\sqrt{x_1^2 + x_2^2}}{1 + x_3}\]Multiplying them together, we get:

\[\abs{w}\abs{w^{*}} = \frac{x_1^2 + x_2^2}{1 - x_3^2}\]Since $p$ is on the sphere, $x_1^2 + x_2^2 + x_3^2 = 1$ or $x_1^2 + x_2^2 = 1 - x_3^2$. Thus $\abs{w}\abs{w^{*}} = 1$ and it’s possible to prove that they have the same argument and conclude that these pair of points are symmetric with respect to the unit circle in the extended complex plane.

*Figure 2* illustrates this idea by analyzing a “cross-section” of the Riemann sphere and showing a projection of 2 points on the sphere that are symmetric with respect to the plane get mapped to points that are symmetric with respect to the unit circle.

The concept of symmetry with respect to a circle is intuitive if we take the conjugate symmetry as analogy. However, whereas the conjugate points $z$ and $\overline{z}$ are equidistant from the line of symmetry (i.e. the real-line), for the circle case, their distance that’s not the case. As a point in the interior of the circle moves away from the border, its corresponding symmetric point moves too, but much faster, as described by equation $(1)$.

The characterization from *Theorem 9* [5], relating it to orthogonal intersections is a lot less intuitive.

As I was writing the post I started noticing some similarities with the Stereographic projections and the Riemann sphere, which I hadn’t seen during my research for the post. I was very happy to figure out a proof on my own and show that the correspondence is actually true.