Jekyll2022-12-02T22:29:26+00:00https://www.kuniga.me/feed.xmlNP-IncompletenessKunigami's Technical BlogGuilherme KunigamiThe Hungarian Algorithm2022-12-02T00:00:00+00:002022-12-02T00:00:00+00:00https://www.kuniga.me/blog/2022/12/02/hungarian<!-- This needs to be define as included html because variables are not inherited by Jekyll pages -->
<figure class="image_float_left">
<a href="https://en.wikipedia.org/wiki/File:Harold_W._Kuhn.jpg">
<img src="https://www.kuniga.me/resources/blog/2022-12-02-hungarian/kuhn.jpg" alt="Harold W. Kuhn thumbnail" />
</a>
</figure>
<p>Harold William Kuhn was an American mathematician, known for the Karush–Kuhn–Tucker conditions and the Hungarian method for the assignment problem [1].</p>
<p>According to Wikipedia [2], Kuhn named the algorithm <em>Hungarian method</em> because it was largely based on the earlier works of Hungarian mathematicians Dénes Kőnig and Jenő Egerváry.</p>
<p>However in 2006, the mathematician Francois Ollivier found out that Carl Jacobi (known for <a href="https://en.wikipedia.org/wiki/Jacobian_matrix_and_determinant">Jacobian matrices</a>) had already developed a similar algorithm in the 19th century in the context of systems of differential equations and emailed Kuhn about it [6].</p>
<p>One fascinating coincidence is that Jacobi is Kuhn’s ancestral advisor according to the Mathematics Genealogy Project! [5]. Here’s the ancestry chain (year when they got their PhD in parenthesis):</p>
<ul>
<li><a href="https://mathgenealogy.org/id.php?id=27174">Harold William Kuhn</a> (1950)</li>
<li><a href="https://mathgenealogy.org/id.php?id=15155">Ralph Hartzler Fox</a> (1939)</li>
<li><a href="https://mathgenealogy.org/id.php?id=7461">Solomon Lefschetz</a> (1911)</li>
<li><a href="https://mathgenealogy.org/id.php?id=7451">William Edward Story</a> (1875)</li>
<li><a href="https://mathgenealogy.org/id.php?id=32858">Carl Gottfried Neumann</a> (1856)</li>
<li><a href="https://mathgenealogy.org/id.php?id=57706">Friedrich Julius Richelot</a> (1831)</li>
<li><a href="https://mathgenealogy.org/id.php?id=15635">Carl Gustav Jacob Jacobi</a> (1825)</li>
</ul>
<p>If this genealogy is to be trusted, I wonder if Kuhn was aware of this fact, given he wrote about Jacobi’s life in [6].</p>
<p>In this post we’ll explore this algorithm and provide an implementation in Python.</p>
<!--more-->
<h2 id="the-balanced-assignment-problem">The Balanced Assignment Problem</h2>
<p>Suppose we have a bipartite graph with a set of vertices $S$ and $T$ of same size $n$, with a set of edges $E$ edges between them associated with a non-negative weight. A matching $M$ is a subset of $E$ such that no two edges in $M$ are incident to the same vertex. A perfect matching is one that covers every vertex. The weight of a matching is the sum of weighs of its edges.</p>
<p>The balanced assignment problem asks to find the perfect matching with the maximum weight. We can reduce several variants of this problem to this specific version.</p>
<h3 id="negative-weights">Negative weights</h3>
<p>We can assume all the weights are non-negative. If there are negative weights, let $W$ be the smallest weight. We add $W$ to all edges so they’re now non-negative. Since the perfect matching has exactly $n$ edges, we just need to discount $nW$ from our solution.</p>
<h3 id="incomplete-graph">Incomplete graph</h3>
<p>We can assume the bipartite graph is complete. If the original graph is not, we can always add artificial edges with weight 0 and remove them later from the solution without affecting the optimal value.</p>
<p>If it’s incomplete <em>and</em> has negative weights, we can simply ignore those by setting those to 0. A matching of maximum weight does not include any negative edges <em>unless</em> we also require the matching to be of maximum cardinality. In this case the reduction to the problem at hand is non-trival.</p>
<h3 id="unbalanced-assignment">Unbalanced assignment</h3>
<p>We can assume that both partitions have the same size. If in the original graph they aren’t, we can add artificial vertices (and then make them complete as above) and remove edges incident to them later from the solution without affecting the optimal value.</p>
<h2 id="integer-linear-programming">Integer Linear Programming</h2>
<p>We can formulate the assignment problem as follows. Suppose we’re given weights $w_{ij}$ for $1 \le i, j \le n$. We introduce binary variables $x_{ij}$ where $x_{ij} = 1$ indicates the edge $(i,j)$ is in the matching.</p>
<p>The objective function is thus:</p>
\[\mbox{maximize} \qquad \sum_{i=1}^{n} \sum_{j=1}^{n} x_{ij} w_{ij}\]
<p>Subject to:</p>
\[\begin{align}
(1) \quad & \sum_{j = 1}^{n} x_{ij} & \le 1 & \qquad \qquad 1 \le i \le n \\
(2) \quad & \sum_{i = 1}^{n} x_{ij} & \le 1 & \qquad \qquad 1 \le j \le n \\
(3) \quad & x_{ij} & \in \curly{0, 1} & \qquad \qquad 1 \le i, j \le n \\
\end{align}\]
<p>Let’s find the <a href="https://en.wikipedia.org/wiki/Dual_linear_program">dual</a> of this ILP. Constraints (1) map to variables we’ll name $u$ and constraints (2) map to variables we’ll name $v$.</p>
\[\mbox{minimize} \qquad \sum_{i=1}^{n} u_i + \sum_{j=1}^{n} v_j\]
<p>Subject to:</p>
\[\begin{align}
(4) \quad & u_i + v_j \ge w_{ij}, & \qquad \qquad 1 \le i, j \le n \\
(5) \quad & u_i \ge 0, & \qquad 1 \le i \le n\\
(6) \quad & v_j \ge 0, & \qquad 1 \le j \le n\\
\end{align}\]
<p>It’s possible to show that if there are feasible solutions $x^{*}_{ij}$ and $u^{*}_i$, $v^{*}_j$ such that their objective functions are equal, that is,</p>
\[\sum_{i=1}^{n} \sum_{j=1}^{n} x^{*}_{ij} w_{ij} = \sum_{i=1}^{n} u^{*}_i + \sum_{j=1}^{n} v^{*}_j\]
<p>then they’re optimal solutions for their respective formulations. Another characterization of optimizality is that the variables satisfy the complementarity constraints:</p>
\[\begin{align}
x_{ij} (u_i + v_j - w_{ij}) &= 0, & \qquad \qquad 1 \le i, j \le n\\
u_i (1 - \sum_{j = 1}^{n} x_{ij}) &= 0 & \qquad \qquad 1 \le i \le n \\
v_j (1 - \sum_{i = 1}^{n} x_{ij}) &= 0 & \qquad \qquad 1 \le j \le n \\
\end{align}\]
<p>Another way to have these constraints:</p>
\[\begin{align}
(7) \quad & x_{ij} \gt 0 & \quad \rightarrow \quad & u_i + v_j = w_{ij} \\
(8) \quad & u_i \gt 0 & \quad \rightarrow \quad & \sum_{j = 1}^{n} x_{ij} = 1 \\
(9) \quad & v_i \gt 0 & \quad \rightarrow \quad & \sum_{i = 1}^{n} x_{ij} = 1 \\
\end{align}\]
<h2 id="the-hungarian-algorithm">The Hungarian algorithm</h2>
<p>The Hungarian algorithm aims to solve the ILP by finding solutions that satisfy (1)-(9). Initially we set</p>
\[\begin{align}
u_i &= \max(w_{ij}), \qquad 1 \le i \le n\\
v_j &= 0\\
x_{ij} &=0\\
\end{align}\]
<p><em>Figure 1</em> depicts an example from [5] which we’ll use throughout this post. It displays the value of variables such $u$ (to the left of the $S$ partition), $v$ (to the right of the $T$ partition), $w_{ij}$ (along the corresponding edge) and the difference $u_i + v_j - w_{ij}$ (defined as <em>slack</em> later, in parenthesis).</p>
<figure class="center_children">
<img src="https://www.kuniga.me/resources/blog/2022-12-02-hungarian/initial.png" alt="See caption." />
<figcaption>Figure 1: Initial choice of the dual variables.</figcaption>
</figure>
<p>We can verify this satisfies all constraints except (8). The algorithm will then iterate on satisfying these constraints without ever violating the other ones. We’ll now describe a high-level way to do this.</p>
<h3 id="max-matching-in-induced-graph">Max matching in induced graph</h3>
<p>First, consider the graph $G(u,v)$ which is the original bipartite graph but only with those edges $(i, j)$ satisfying $u_i + v_j = w_{ij}$. We call such edges <strong>tight</strong>, whereas those with $u_i + v_j \gt w_{ij}$ are <strong>loose</strong>. The amount to remove from $u_i + v_j$ of a loose edge to turn it into a tight edge is called <strong>slack</strong> and denoted by $\delta = u_i + v_j - w_{ij}$.</p>
<p>From our initial choice of $u$ every node in $S$ has at least one tight edge incident to it. To see why, let $j_i$ such that $w_{ij_i} = \max(w_{ij}) = u_i$. Since $v_{j_i} = 0$, we have $u_i + v_{j_i} = w_{ij_i} + 0 = w_{ij_i}$. Thus edge $(i,j_i)$ is in $G(u,v)$.</p>
<p>We then try to find a maximum cardinality matching $M_{uv}$ in $G(u,v)$. If we find a perfect match (i.e. cardinality $n$), then setting $x_{ij} = 1$ for edges in the matching will satisfy (8) and we’re done.</p>
<p>If not, we need to add edges to $G(u,v)$ by manipulating $u$ and $v$ such that the resulting $G(u,v)$ will allow a bigger matching. We need some theory first.</p>
<p><em>Figure 2</em> shows the induced graph corresponding to our initial example where the edges are tight and the red bold edge is some maximal matching in such graph.</p>
<figure class="center_children">
<img src="https://www.kuniga.me/resources/blog/2022-12-02-hungarian/induced.png" alt="See caption." />
<figcaption>Figure 2: Induced graph with a maximal matching.</figcaption>
</figure>
<h3 id="berges-theorem">Berge’s theorem</h3>
<p>We note that the matching $M_{uv}$ is also a matching in the original graph. Since the original graph is complete it has to have a perfect match, so $M_{uv}$ is not maximal there. We’ll now see how to increase or augment the size of $M_{uv}$.</p>
<p>Let $M$ be a matching in a bipartite graph. We define an <strong>alternating path</strong> in $M$ as a path in the graph whose edges alternate between those in the matching and those not. If no edge in the matching is incident to a vertex $v$, we say the vertex is <strong>unmatched</strong>. If an alternating path starts <em>and</em> ends in an unmatched vertex, we call it an <strong>augmenting path</strong>.</p>
<p>The reason it’s called <em>augmenting</em> is that it’s can be used to increase the size of a matching. The augmenting path necessarily has an odd number of edges, say $2k + 1$, and $k + 1$ of these edges are not in the matching, while $k$ are. We can increase the matching by 1 by simply putting all edges not in the matching into the matching and vice-versa. <em>Figure 3</em> has an example.</p>
<figure class="center_children">
<img src="https://www.kuniga.me/resources/blog/2022-12-02-hungarian/augmenting.png" alt="See caption." />
<figcaption>Figure 3: Example of an augmenting path in a match of size 2. By "flipping" the edges in-out of the matching we augment it by one</figcaption>
</figure>
<p><a href="https://en.wikipedia.org/wiki/Berge%27s_theorem">Berge’s theorem</a> states that a matching $M$ in a graph is maximum if and only if there is no augmenting path in $M$. This implies that if the matching is not maximal, there must exist an augmenting path we can use to increase the size of $M$.</p>
<p>Starting from the fact that $M_{uv}$ is maximal in $G(u,v)$ but not in $G$, we can use Berge’s theorem to claim there is an augmenting path $P$ in $G$ that doesn’t exist in $G(u,v)$.</p>
<p>How can we find $P$? We first need to introduce an auxiliary structure.</p>
<h3 id="forest-of-alternating-paths">Forest of alternating paths</h3>
<p>Let $r$ be an unmatched vertex from $S$. Traverse the graph from $r$ but only following alternating edges and do not visit any vertex more than once. One way to achieve this is to orient the edges: for every edge in $(i,j) \in G(u,v)$ with $i \in S, j \in T$, we orient $i \rightarrow j$ if $(i,j)$ is not in $M_{uv}$ and orient $j \rightarrow i$ otherwise. Then we can just do a BFS in this directed graph from $r$. Let $C(r)$ be the tree corresponding to such traversal.</p>
<p>Repeat this for every other unmatched vertex in $S$, with the care not to visit any vertex that belongs to some other tree. The result of these traversals is a collection of disjoint trees, or <strong>forest of alternating paths</strong> which we’ll call $F$.</p>
<figure class="center_children">
<img src="https://www.kuniga.me/resources/blog/2022-12-02-hungarian/crown_shyness.jpg" alt="See caption." />
<figcaption>Figure 4: <a href="https://en.wikipedia.org/wiki/Crown_shyness">Crown Shyness</a>: a phenomenon observed in some tree species, in which the crowns of trees do not touch each other. Evoking the disjointness of the trees in our data structure.</figcaption>
</figure>
<p>Note that this forest does not necessarily contain all edges from the matching.</p>
<figure class="center_children">
<img src="https://www.kuniga.me/resources/blog/2022-12-02-hungarian/forest.png" alt="See caption." />
<figcaption>Figure 5: Example of a forest (with a single tree, colored green) that does not include the edge (3, 3) which is the matching. 3 does not form a tree because it's already matched.</figcaption>
</figure>
<h3 id="adding-edges-to-the-forest">Adding edges to the forest</h3>
<p>We can manipulate $u$ and $v$ in order to add an edge to $F$. Consider some edge $(x, y)$ where $x \in S$ and $x \in F$, $y \in T$ and $y \not \in F$ and defined as a <strong>frontier edge</strong>. Let $\delta = u_x + v_y - w_{xy}$. If we subtract $\delta$ for all $u_i$, $i \in F \cap S$ and add $\delta$ for all $v_j$, $j \in F \cap T$, it’s easy to see that all edges in $F$ will continue to be tight. Since we didn’t add $\delta$ to $v_y$ but subtracted from $u_x$, edge $(x, y)$ is now tight.</p>
<p>However, by doing this we might violate (4), i.e. $u_i + v_j \ge w_{ij}$, for some edges. Suppose there is another frontier edge $(x’, y’)$ with slack $\delta’ = u_x’ + v_y’ - w_{x’y’} \lt \delta$. We have that $u_x’ + v_y’ - \delta’ = w_{x’y’}$. If we subtract $\delta$ from $u_x’$ and leave $v_y’$ unchanged, we get $u_x’ + v_y’ - \delta \lt u_x’ + v_y’ - \delta’ = w_{x’y’}$ which violates (4).</p>
<p>To avoid this, we choose the smallest slack $\delta$ among all frontier edges. We’re guaranteed to add at least one edge to the forest $F$ without violating any new constraints. <em>Figure 6.</em> shows an example.</p>
<p>Now with $(x, y)$ added, we have that either $y$ is unmatched which means we just found an augmenting path, or $y$ is part of an edge $(x’, y)$ in the matching that is not reachable by any of the root vertices in $F$, but we can now add it to $F$, since $(x, y) + (x’, y)$ is alternating.</p>
<figure class="center_children">
<img src="https://www.kuniga.me/resources/blog/2022-12-02-hungarian/dual_change.png" alt="See caption." />
<figcaption>Figure 6: Before and after the dual variable changes. The dashed edges are not in the forest on the left, but they have the smallest slack, 1. So this is what we're going to subtract from the green nodes in S and add to the green nodes in T. With these new edges a augmenting path can be found, as shown on the right as a sequence of blue and red edges, starting from 3 in S.</figcaption>
</figure>
<p>Once we find an augmenting path $P$ and “flip” the edges to increase the matching $M_{uv}$ by one, we need to recalculate the forest of alternating paths because the first vertex of $P$ is no longer unmatched.</p>
<p>Every time we do a change of dual variables we either increase the matching size by one, or increase the forest by 2 edges. When we increase the matching, we recalculate the forest so we can’t assume the size of the tree will remain constant, but even if tree was emptied out every time that happened, there would still be a ceiling of $O(n^2)$ change of dual variables that can happen.</p>
<p>This proves the algorithm finishes. Let’s now prove it is correct.</p>
<h3 id="optimality">Optimality</h3>
<p>Suppose by contradition that there are no frontier edges in $F$ and the corresponding matching $M_{uv}$ is not perfect. By Berge’s theorem, there is an augmenting path $P$ for $M_{uv}$ in $G$.</p>
<p>Let $r$ be its first vertex. By definition, it’s unmatched and w.l.o.g. assume it’s in $S$ and thus root of some tree in $F$. Let $(i, j)$ be the first edge that doesn’t belong to $C(r)$. There are a few scenarios to consider.</p>
<p><strong>Case 1.</strong> Suppose $(i, j)$ exists in $G(u,v)$. This is possible because we might not add an edge $(i,j)$ to $C(r)$ if the vertex $j$ has already been visited by some other tree, rooted at $r’$. This is the same as <em>Case 2.2</em>.</p>
<p><strong>Case 2.</strong> Suppose $(i, j)$ does not exist in $G(u,v)$. We have that $i$ belongs to $F$ (because we’re assume the edge $(i, j’)$ preceding $(i, j) \in P$ exists in $F$). We have now 2 subcases:</p>
<p><strong>Case 2.1</strong> $j \not \in F$. Then $(i, j)$ is a frontier edge but that contradicts our initial hypothesis. Hence this case can’t happen.</p>
<p><strong>Case 2.2</strong> $j \in F$. Then $j$ belongs to the tree of another root $r’$. Let $Q$ be the (alternating) path from $r’$ to $j$ and $P_j$ be the part of path $P$ starting at $j$ and $P_i$ the part of path $P$ ending in vertex $i$. Since $P_i$ belongs to the tree of $r$, it’s disjoint of $Q$ and hence $Q + P_j$ forms a path, it’s an augmenting one and it has at least one fewer edge not in $F$ than $P$, so this eventually reduces to <em>Case 2.1</em>.</p>
<h2 id="python-implementation">Python Implementation</h2>
<p>With the ideas discussed above we are ready to implement this algorithm. There are three major routines: building the alternating path forest, augmenting an augmenting path and changing the dual variables.</p>
<p>Let’s start with <code class="language-plaintext highlighter-rouge">augment()</code>:</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="k">def</span> <span class="nf">augment</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">j</span><span class="p">):</span>
<span class="k">while</span> <span class="n">j</span> <span class="ow">is</span> <span class="ow">not</span> <span class="bp">None</span><span class="p">:</span>
<span class="n">i</span> <span class="o">=</span> <span class="bp">self</span><span class="p">.</span><span class="n">parent</span><span class="p">[</span><span class="n">T</span><span class="p">][</span><span class="n">j</span><span class="p">]</span>
<span class="bp">self</span><span class="p">.</span><span class="n">match</span><span class="p">[</span><span class="n">T</span><span class="p">][</span><span class="n">j</span><span class="p">]</span> <span class="o">=</span> <span class="n">i</span>
<span class="bp">self</span><span class="p">.</span><span class="n">match</span><span class="p">[</span><span class="n">S</span><span class="p">][</span><span class="n">i</span><span class="p">]</span> <span class="o">=</span> <span class="n">j</span>
<span class="n">j</span> <span class="o">=</span> <span class="bp">self</span><span class="p">.</span><span class="n">parent</span><span class="p">[</span><span class="n">S</span><span class="p">][</span><span class="n">i</span><span class="p">]</span></code></pre></figure>
<p>In this method <code class="language-plaintext highlighter-rouge">parent[.][x]</code> is the parent of vertex <code class="language-plaintext highlighter-rouge">x</code> in the forest. <code class="language-plaintext highlighter-rouge">match[.][x]</code> is the index of the vertex matched with <code class="language-plaintext highlighter-rouge">x</code>. <code class="language-plaintext highlighter-rouge">T</code> is an alias to 1 and <code class="language-plaintext highlighter-rouge">S</code> to 0. I’ve opted to use a $2 \times n$ matrix to store information about the vertices on the bipartite graph.</p>
<p>The idea is simple: we start an unmatched $j \in T$ so we know that the edge $(i, j)$ with its parent $i$ is not in the match, so we can match $i$ and $j$. Note that $i$ was matched with its parent $j’$, but we don’t need to unmatch because $j’$ will be eventually rematched.</p>
<p>Next we explore <code class="language-plaintext highlighter-rouge">change_duals()</code>:</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="k">def</span> <span class="nf">change_duals</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="n">delta</span> <span class="o">=</span> <span class="nb">min</span><span class="p">(</span><span class="n">v</span> <span class="k">for</span> <span class="n">v</span> <span class="ow">in</span> <span class="bp">self</span><span class="p">.</span><span class="n">slack</span> <span class="k">if</span> <span class="n">v</span> <span class="o">></span> <span class="mi">0</span><span class="p">)</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="bp">self</span><span class="p">.</span><span class="n">n</span><span class="p">):</span>
<span class="k">if</span> <span class="bp">self</span><span class="p">.</span><span class="n">visited</span><span class="p">[</span><span class="n">S</span><span class="p">][</span><span class="n">i</span><span class="p">]:</span>
<span class="bp">self</span><span class="p">.</span><span class="n">u</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">-=</span> <span class="n">delta</span>
<span class="k">for</span> <span class="n">j</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="bp">self</span><span class="p">.</span><span class="n">n</span><span class="p">):</span>
<span class="k">if</span> <span class="bp">self</span><span class="p">.</span><span class="n">visited</span><span class="p">[</span><span class="n">T</span><span class="p">][</span><span class="n">j</span><span class="p">]:</span>
<span class="bp">self</span><span class="p">.</span><span class="n">v</span><span class="p">[</span><span class="n">j</span><span class="p">]</span> <span class="o">+=</span> <span class="n">delta</span>
<span class="c1"># frontier edge
</span> <span class="k">elif</span> <span class="bp">self</span><span class="p">.</span><span class="n">slack</span><span class="p">[</span><span class="n">j</span><span class="p">]</span> <span class="o">></span> <span class="mi">0</span><span class="p">:</span>
<span class="bp">self</span><span class="p">.</span><span class="n">slack</span><span class="p">[</span><span class="n">j</span><span class="p">]</span> <span class="o">-=</span> <span class="n">delta</span>
<span class="c1"># note: self.parent[j] has been set
</span> <span class="c1"># during the forest visit
</span> <span class="k">if</span> <span class="bp">self</span><span class="p">.</span><span class="n">slack</span><span class="p">[</span><span class="n">j</span><span class="p">]</span> <span class="o">==</span> <span class="mi">0</span><span class="p">:</span>
<span class="bp">self</span><span class="p">.</span><span class="n">candidates</span><span class="p">.</span><span class="n">put</span><span class="p">((</span><span class="mi">1</span><span class="p">,</span> <span class="n">j</span><span class="p">))</span></code></pre></figure>
<p>Here <code class="language-plaintext highlighter-rouge">slack[j]</code> represents the slack for $j \in T$. If slack is positive then <code class="language-plaintext highlighter-rouge">(parent[T][j], j)</code> is either a frontier edge or it’s an unreacheable vertex, in which case <code class="language-plaintext highlighter-rouge">slack[j] = INF</code> so <code class="language-plaintext highlighter-rouge">delta = min(...)</code> works.</p>
<p>Then we proceed to subtract <code class="language-plaintext highlighter-rouge">delta</code> from <code class="language-plaintext highlighter-rouge">u</code> and add to <code class="language-plaintext highlighter-rouge">v</code>. We also update <code class="language-plaintext highlighter-rouge">slack</code> to keep it consistent with <code class="language-plaintext highlighter-rouge">u</code> and <code class="language-plaintext highlighter-rouge">v</code>. <code class="language-plaintext highlighter-rouge">visited[.][x]</code> indicates whether the vertex belongs to the forest.</p>
<p>We finally have an optimization: if <code class="language-plaintext highlighter-rouge">slack[j]</code> became 0, we added <code class="language-plaintext highlighter-rouge">j</code> to the forest, so we can continue building the forest from <code class="language-plaintext highlighter-rouge">j</code> instead of doing it from scratch.</p>
<p>We now define <code class="language-plaintext highlighter-rouge">visit_forest()</code>:</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="k">def</span> <span class="nf">visit_forest</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">while</span> <span class="ow">not</span> <span class="bp">self</span><span class="p">.</span><span class="n">candidates</span><span class="p">.</span><span class="n">empty</span><span class="p">():</span>
<span class="p">[</span><span class="n">side</span><span class="p">,</span> <span class="n">c</span><span class="p">]</span> <span class="o">=</span> <span class="bp">self</span><span class="p">.</span><span class="n">candidates</span><span class="p">.</span><span class="n">get</span><span class="p">()</span>
<span class="k">if</span> <span class="bp">self</span><span class="p">.</span><span class="n">visited</span><span class="p">[</span><span class="n">side</span><span class="p">][</span><span class="n">c</span><span class="p">]:</span>
<span class="k">continue</span>
<span class="bp">self</span><span class="p">.</span><span class="n">visited</span><span class="p">[</span><span class="n">side</span><span class="p">][</span><span class="n">c</span><span class="p">]</span> <span class="o">=</span> <span class="bp">True</span>
<span class="k">if</span> <span class="n">side</span> <span class="o">==</span> <span class="n">S</span><span class="p">:</span> <span class="c1"># c in S
</span> <span class="bp">self</span><span class="p">.</span><span class="n">visit_s</span><span class="p">(</span><span class="n">c</span><span class="p">)</span>
<span class="k">else</span><span class="p">:</span> <span class="c1"># c in T
</span> <span class="n">augmented</span> <span class="o">=</span> <span class="bp">self</span><span class="p">.</span><span class="n">visit_t</span><span class="p">(</span><span class="n">c</span><span class="p">)</span>
<span class="k">if</span> <span class="n">augmented</span><span class="p">:</span>
<span class="k">return</span> <span class="bp">True</span>
<span class="k">return</span> <span class="bp">False</span></code></pre></figure>
<p>This method basically implements a BFS using a queue. The current nodes to be visited are stored in <code class="language-plaintext highlighter-rouge">candidates</code>. We have to handle vertices in $S$ and $T$ differently.</p>
<p>When visiting $T$, there’s a chance we find an augmenting path. If we do, we’ll need to restart the construction of the forest, so we have to short-circuit.</p>
<p>For $i \in S$ we call <code class="language-plaintext highlighter-rouge">visit_s()</code>:</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="k">def</span> <span class="nf">visit_s</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">i</span><span class="p">):</span>
<span class="c1"># update pi from edges not in match
</span> <span class="k">for</span> <span class="n">j</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="bp">self</span><span class="p">.</span><span class="n">n</span><span class="p">):</span>
<span class="k">if</span> <span class="bp">self</span><span class="p">.</span><span class="n">match</span><span class="p">[</span><span class="n">S</span><span class="p">][</span><span class="n">i</span><span class="p">]</span> <span class="o">==</span> <span class="n">j</span><span class="p">:</span>
<span class="k">continue</span>
<span class="n">slack</span> <span class="o">=</span> <span class="bp">self</span><span class="p">.</span><span class="n">u</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">+</span> <span class="bp">self</span><span class="p">.</span><span class="n">v</span><span class="p">[</span><span class="n">j</span><span class="p">]</span> <span class="o">-</span> <span class="bp">self</span><span class="p">.</span><span class="n">adj</span><span class="p">[</span><span class="n">S</span><span class="p">][</span><span class="n">i</span><span class="p">][</span><span class="n">j</span><span class="p">]</span>
<span class="k">if</span> <span class="n">slack</span> <span class="o"><</span> <span class="bp">self</span><span class="p">.</span><span class="n">slack</span><span class="p">[</span><span class="n">j</span><span class="p">]:</span>
<span class="bp">self</span><span class="p">.</span><span class="n">slack</span><span class="p">[</span><span class="n">j</span><span class="p">]</span> <span class="o">=</span> <span class="n">slack</span>
<span class="bp">self</span><span class="p">.</span><span class="n">parent</span><span class="p">[</span><span class="n">T</span><span class="p">][</span><span class="n">j</span><span class="p">]</span> <span class="o">=</span> <span class="n">i</span>
<span class="c1"># edge (i,j) is now in the forest. keep visiting
</span> <span class="k">if</span> <span class="n">slack</span> <span class="o">==</span> <span class="mi">0</span><span class="p">:</span>
<span class="bp">self</span><span class="p">.</span><span class="n">candidates</span><span class="p">.</span><span class="n">put</span><span class="p">((</span><span class="mi">1</span><span class="p">,</span> <span class="n">j</span><span class="p">))</span></code></pre></figure>
<p>We first observe that we skip visiting its match. That’s because if $(i, j) \in M_{uv}$, then we arrived at $i$ from $j$ already.</p>
<p>We just need to update the slacks from its neighbors in $T$ in $G$. Note that we update the <code class="language-plaintext highlighter-rouge">slack</code> and <code class="language-plaintext highlighter-rouge">parent</code> even for edges that are not in $G(u, v)$. This is important for the <code class="language-plaintext highlighter-rouge">change_duals()</code>, since once we change <code class="language-plaintext highlighter-rouge">u</code> and <code class="language-plaintext highlighter-rouge">v</code> the <code class="language-plaintext highlighter-rouge">slack</code> on that edge might go to 0 and cause it to be added.</p>
<p>For $j \in T$ we call <code class="language-plaintext highlighter-rouge">visit_t()</code>:</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="k">def</span> <span class="nf">visit_t</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">j</span><span class="p">):</span>
<span class="n">i</span> <span class="o">=</span> <span class="bp">self</span><span class="p">.</span><span class="n">match</span><span class="p">[</span><span class="n">T</span><span class="p">][</span><span class="n">j</span><span class="p">]</span>
<span class="c1"># found an augmenting path
</span> <span class="k">if</span> <span class="n">i</span> <span class="ow">is</span> <span class="bp">None</span><span class="p">:</span>
<span class="bp">self</span><span class="p">.</span><span class="n">augment</span><span class="p">(</span><span class="n">j</span><span class="p">)</span>
<span class="k">return</span> <span class="bp">True</span>
<span class="bp">self</span><span class="p">.</span><span class="n">parent</span><span class="p">[</span><span class="n">S</span><span class="p">][</span><span class="n">i</span><span class="p">]</span> <span class="o">=</span> <span class="n">j</span>
<span class="bp">self</span><span class="p">.</span><span class="n">candidates</span><span class="p">.</span><span class="n">put</span><span class="p">((</span><span class="mi">0</span><span class="p">,</span> <span class="n">i</span><span class="p">))</span>
<span class="k">return</span> <span class="bp">False</span></code></pre></figure>
<p>If $j$ is unmatched, we found an augmenting path so we can call <code class="language-plaintext highlighter-rouge">augment(j)</code> and stop trying to visit/construct the forest.</p>
<p>Otherwise we keep constructing the alternating trees by following the edge from the match (recalling that <code class="language-plaintext highlighter-rouge">j</code> was reached from a non-match edge).</p>
<p>We can put it together in <code class="language-plaintext highlighter-rouge">expand_matching()</code>:</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="k">def</span> <span class="nf">expand_matching</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="bp">self</span><span class="p">.</span><span class="n">init_candidates</span><span class="p">()</span>
<span class="bp">self</span><span class="p">.</span><span class="n">init_parent</span><span class="p">()</span>
<span class="bp">self</span><span class="p">.</span><span class="n">init_visited</span><span class="p">()</span>
<span class="bp">self</span><span class="p">.</span><span class="n">init_slack</span><span class="p">()</span>
<span class="k">while</span> <span class="bp">True</span><span class="p">:</span>
<span class="n">augmented</span> <span class="o">=</span> <span class="bp">self</span><span class="p">.</span><span class="n">visit_forest</span><span class="p">()</span>
<span class="k">if</span> <span class="n">augmented</span><span class="p">:</span>
<span class="k">return</span>
<span class="bp">self</span><span class="p">.</span><span class="n">change_duals</span><span class="p">()</span></code></pre></figure>
<p>We first reset all the variables so we can the recreate the forest from scratch. The only non-trivial <code class="language-plaintext highlighter-rouge">init_*</code> function is <code class="language-plaintext highlighter-rouge">init_candidates()</code> which initializes the <code class="language-plaintext highlighter-rouge">candidates</code> queue with all the unmatched vertices from $S$.</p>
<p>Then we do a mix of BFS exploration with dual variable changes until we are able to augment an augmenting path. Finally the main routine <code class="language-plaintext highlighter-rouge">solve()</code>:</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="k">def</span> <span class="nf">solve</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="bp">self</span><span class="p">.</span><span class="n">init_match</span><span class="p">()</span>
<span class="bp">self</span><span class="p">.</span><span class="n">init_u</span><span class="p">()</span>
<span class="bp">self</span><span class="p">.</span><span class="n">init_v</span><span class="p">()</span>
<span class="c1"># for each iteration we should increase the matching side
</span> <span class="k">for</span> <span class="n">t</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="bp">self</span><span class="p">.</span><span class="n">n</span><span class="p">):</span>
<span class="bp">self</span><span class="p">.</span><span class="n">expand_matching</span><span class="p">()</span>
<span class="k">return</span> <span class="nb">sum</span><span class="p">(</span><span class="bp">self</span><span class="p">.</span><span class="n">u</span><span class="p">)</span> <span class="o">+</span> <span class="nb">sum</span><span class="p">(</span><span class="bp">self</span><span class="p">.</span><span class="n">v</span><span class="p">)</span></code></pre></figure>
<p>First we initialize the dual variables and the match which don’t need to be reset during the execution of the algorithm. Recalling that <code class="language-plaintext highlighter-rouge">init_u()</code> starts with the maximum weight of edges incident to each vertex in $S$:</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="k">def</span> <span class="nf">init_u</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="n">u</span> <span class="o">=</span> <span class="p">[</span><span class="mi">0</span><span class="p">]</span><span class="o">*</span><span class="bp">self</span><span class="p">.</span><span class="n">n</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="bp">self</span><span class="p">.</span><span class="n">n</span><span class="p">):</span>
<span class="k">for</span> <span class="n">j</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="bp">self</span><span class="p">.</span><span class="n">n</span><span class="p">):</span>
<span class="n">u</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">=</span> <span class="nb">max</span><span class="p">(</span><span class="n">u</span><span class="p">[</span><span class="n">i</span><span class="p">],</span> <span class="bp">self</span><span class="p">.</span><span class="n">adj</span><span class="p">[</span><span class="n">S</span><span class="p">][</span><span class="n">i</span><span class="p">][</span><span class="n">j</span><span class="p">])</span>
<span class="bp">self</span><span class="p">.</span><span class="n">u</span> <span class="o">=</span> <span class="n">u</span></code></pre></figure>
<p>It then calls the <code class="language-plaintext highlighter-rouge">expand_matching()</code> function <code class="language-plaintext highlighter-rouge">n</code> times. Each <code class="language-plaintext highlighter-rouge">expand_matching()</code> increases the matching size by at least one, so this is enough to find the optimal matching.</p>
<p>After the loop, since <code class="language-plaintext highlighter-rouge">u</code> and <code class="language-plaintext highlighter-rouge">v</code> are optimal, we can use the objective function of the dual ILP which is a bit simpler to compute than adding the weights of the edges in the matching.</p>
<p>The full <a href="/hungarian.py">Python</a> implementation as well as a <a href="/hungarian.cpp">C++ implementation</a> one are available on Github.</p>
<h3 id="runtime-complexity">Runtime complexity</h3>
<p>We argue that <code class="language-plaintext highlighter-rouge">expand_matching()</code> is $O(n^2)$, the cost of doing a BFS on the graph. Even when we call <code class="language-plaintext highlighter-rouge">change_duals()</code> we don’t re-do the BFS from scratch but continue from where we stopped, so we visit each edge at most once in <code class="language-plaintext highlighter-rouge">expand_matching()</code>.</p>
<p>We only reset the BFS once we do <code class="language-plaintext highlighter-rouge">augment()</code> but we only do it $O(n)$ times through the algorithm, so the total runtime complexity is $O(n^3)$.</p>
<h2 id="conclusion">Conclusion</h2>
<p>I was studying the Hungarian method from Lawler’s <em>Combinatorial Optimization: Networks and Matroids</em> [3] and it was incredibly hard to convert the provided pseudo-code into a working implementation.</p>
<p>I missed several details like the fact that the graph has to be complete (if it’s not, the algorithm is unable to find augmenting paths correctly) and the partitions of the same size. There’s vagueness about in which order to visit vertices or terms like unscanned.</p>
<p>What ultimately helped me understand the algorithm in detail was Topcoder’s article [4] and Wikipedia [2].</p>
<p>The Hungarian is another one of the algorithms I used multiple times but never understood in detail until writing about it, like the <a href="https://www.kuniga.me/blog/2016/03/13/tree-ring-matching-using-the-kmp-algorithm.html">KMP</a>.</p>
<h2 id="related-posts">Related Posts</h2>
<p><a href="https://www.kuniga.me/blog/2013/11/11/lawler-and-an-introduction-to-matroids.html">An Introduction to Matroids</a> - In that post we talk about the greedy algorithm for finding minimum/maximum spanning tree, known as the Kruskal algorithm. The idea behind the change of dual variables in the Hungarian algorithm vaguely reminds me of the Kruskal algorithm, in which we choose the edge with lowest weight to add to the existing forest.</p>
<p><a href="https://www.kuniga.me/blog/2012/02/05/lagrangean-relaxation-theory.html">Lagrangian Relaxation</a> - Duality is utilized in Lagrangian Relaxation to obtain upper bounds (in case of maximization) for branch-and-bound algorithms.</p>
<p><a href="https://www.kuniga.me/blog/2012/09/02/totally-unimodular-matrices.html">Totally Unimodular Matrices</a> - The incidence matrix of a bipartite graph is totally unimodular (TU). This allows deriving the König-Egerváry theorem (<em>the</em> Hungarians) which says that maximum cardinality matching and minimum vertex cover are duals in bipartite graphs.</p>
<p>As we described, the Hungarian algorithm uses maximum cardinality matching as a sub-routine. There’s a variant which we didn’t mention which uses minimum vertex cover as sub-routine [2] (see <em>Matrix interpretation</em>).</p>
<h2 id="references">References</h2>
<ul>
<li>[<a href="https://en.wikipedia.org/wiki/Harold_W._Kuhn">1</a>] Wikipedia: Harold W. Kuhn</li>
<li>[<a href="https://en.wikipedia.org/wiki/Hungarian_algorithm">2</a>] Wikipedia: Hungarian Algorithm</li>
<li>[3] Combinatorial Optimization: Networks and Matroids, Eugene Lawler.</li>
<li>[<a href="https://www.topcoder.com/thrive/articles/Assignment%20Problem%20and%20Hungarian%20Algorithm">4</a>] Topcoder: Assignment Problem and Hungarian Algorithm</li>
<li>[<a href="https://mathgenealogy.org">5</a>] Mathematics Genealogy Project</li>
<li>[6] A tale of three eras: The discovery and rediscovery of the Hungarian Method, Harold W. Kuhn.</li>
</ul>Guilherme KunigamiHarold William Kuhn was an American mathematician, known for the Karush–Kuhn–Tucker conditions and the Hungarian method for the assignment problem [1]. According to Wikipedia [2], Kuhn named the algorithm Hungarian method because it was largely based on the earlier works of Hungarian mathematicians Dénes Kőnig and Jenő Egerváry. However in 2006, the mathematician Francois Ollivier found out that Carl Jacobi (known for Jacobian matrices) had already developed a similar algorithm in the 19th century in the context of systems of differential equations and emailed Kuhn about it [6]. One fascinating coincidence is that Jacobi is Kuhn’s ancestral advisor according to the Mathematics Genealogy Project! [5]. Here’s the ancestry chain (year when they got their PhD in parenthesis): Harold William Kuhn (1950) Ralph Hartzler Fox (1939) Solomon Lefschetz (1911) William Edward Story (1875) Carl Gottfried Neumann (1856) Friedrich Julius Richelot (1831) Carl Gustav Jacob Jacobi (1825) If this genealogy is to be trusted, I wonder if Kuhn was aware of this fact, given he wrote about Jacobi’s life in [6]. In this post we’ll explore this algorithm and provide an implementation in Python.My Favorite Subjects2022-11-23T00:00:00+00:002022-11-23T00:00:00+00:00https://www.kuniga.me/blog/2022/11/23/my-favorite-subjects<!-- This needs to be define as included html because variables are not inherited by Jekyll pages -->
<p>I’ve recently visited my parents’ place in Brazil and found an old math book from high school. From time to time I reflect on topics I enjoy today compared to those I liked or disliked growing up.</p>
<figure class="center_children">
<img src="https://www.kuniga.me/resources/blog/2022-11-17-my-favorite-subjects/math_book.jpg" alt="See caption." />
<figcaption>Figure 1: Math book from the last year of high school.</figcaption>
</figure>
<p>This seemed like a good opportunity to write down some of those thoughts and provide a glimpse of education in Brazil for the curious.</p>
<!--more-->
<p><em>Disclaimer:</em> this describes the education at the time I attended school. It’s likely this is not representative of today’s education system in Brazil at large.</p>
<p>Education in Brazil is divided in mainly 3 parts: <em>ensino fundamental</em> (fundamental instruction), <em>ensino médio</em> (midle instruction) and <em>ensino superior</em> (high instruction). The first roughly corresponds to elementary and middle school in the US, the second to high school and the third to college. We also have masters and PhD programs which are largely equivalent to the ones in the US.</p>
<h2 id="fundamental-instruction">Fundamental Instruction</h2>
<p>During <em>ensino fundamental</em> we had mostly four subjects: Math, Portuguese, Social Sciences (Geography and History) and Science (Biology). During 5th grade I believe English (as second language) was added and some Chemistry and Physics on 8th grade (the final one at the time - I believe there are 9 grades today).</p>
<p>My favorite subjects by then were Math and Portuguese, and I disliked Social Sciences. Portuguese seems like an outlier given my leaning towards STEM classes, but in retrospect, what is taught in Portuguese until high school is grammar, and compared to English, Portuguese grammar is more complicated, which gets us into spending multiple years studying it. A lot of the exercises we had were about parsing sentences and classifying the tokens based on syntax.</p>
<h2 id="high-school">High School</h2>
<p>I attended a high school which was a mix of normal high school and professional school. My focus area was electronics, so we had extra classes corresponding to those one might encounter in Electrical Engineering college major but simplified for high schoolers and with bigger emphasis on practice.</p>
<p>This included classes like electronics, electromagnetism, circuit design, etc. We also learned low-level programming, in assembly, so technically I learned how to code in high school! But I recall it being very difficult and didn’t particularly enjoy it.</p>
<p>I also recall this teacher trying to teach us Calculus informally and that it didn’t make sense to me. It would only become intuitive to me in college, after I had numerical calculus and his explanations clicked after those many years.</p>
<p>My favorite subjects in high school were Math and Physics. From the specialized classes, I really liked this class called digital processing techniques. It involved the study of logical circuits including how to simplify them by showing the equivalence of their outputs (mostly by brute force, by writing down exhaustive truth tables), so it was largely Boolean Algebra.</p>
<h2 id="college">College</h2>
<p>Based on my experience with the specialized classes in high school, I learned I wouldn’t enjoy majoring in Electrical Engineering. I’m lucky in this regard, since most people have to pick a major in college without any prior exposure. I did know I enjoyed math/logic a lot but at the time I didn’t associated that with Computer Engineering/Science.</p>
<p>What made me go into Computer Engineering was the more mundane fact that I started playing PC games at the time and hoped I could get into game development some day.</p>
<p>In college I got involved with programming contests on my first year. It has had a huge impact on my life trajectory so far, and at the time it also influenced what kind of subjects I enjoyed. Not surprisingly, my favorite classes were the analysis of algorithms series and even graduate courses related to programming contests. But this came at the expense of my interest in subjects I used to like before including Math and Physics.</p>
<p>The subjects I recall disliking were Databases, Distributed Systems, Network, Operating Systems and Software Engineering. More on this later.</p>
<h2 id="post-college">Post-College</h2>
<p>After college I realized that when the motivation to learn something comes from myself, the learning experience is highly enjoyable. I also learned that writing about subjects I’m studying is a big motivating factor.</p>
<p>One interesting reversal happened: I’m now very eager to learn about subjects I disliked in college such as Databases, Distributed Systems and Operating Systems. I believe it’s mostly because I can relate to them in my work, even if tangentially.</p>
<p>Another reversal is that I’m now also keen on learning History. I found that these are the non-fiction books I’ve been enjoying the most in the past few years. I think a large part of it is being able to connect to other bits of history I already know. It feels like putting pieces of a giang mental jigsaw in place. I believe another big contributor is having been more exposed to museums and having travelled more.</p>
<p>I’ve more recently rekindled my interest in Math and Physics. There’s so much I don’t know about these subjects that it’s easier to feel the excitement of 0-to-1 learning like <a href="https://www.kuniga.me/blog/2022/11/03/topological-equivalence.html">Topology</a>.</p>Guilherme KunigamiI’ve recently visited my parents’ place in Brazil and found an old math book from high school. From time to time I reflect on topics I enjoy today compared to those I liked or disliked growing up. Figure 1: Math book from the last year of high school. This seemed like a good opportunity to write down some of those thoughts and provide a glimpse of education in Brazil for the curious.Topological Equivalence2022-11-03T00:00:00+00:002022-11-03T00:00:00+00:00https://www.kuniga.me/blog/2022/11/03/topological-equivalence<!-- This needs to be define as included html because variables are not inherited by Jekyll pages -->
<p>I’ve recently read the book <em>Introduction to Topology</em> by Bert Mendelson [1] and before that my only knowledge of topology is that some objects are topologically equivalent, for example a mug and a doughnut.</p>
<p>After reading the book, I found topology is a lot more about algebraic formalism than visual geometry. So in this post I’d like to discuss the idea of topological equivalence from this formal perspective but using visual examples for better intuition.</p>
<figure class="center_children">
<img src="https://www.kuniga.me/resources/blog/2022-11-03-topological-equivalence/equivalence.png" alt="See caption." />
<figcaption>Figure 1: Left: A mug from the Museum of Mathematics in New York. Right: A doughnut generated using Stable Diffusion.</figcaption>
</figure>
<p>We’ll review metric spaces, then generalize to topological spaces, introduce a formal definition for continuous functions and then explore homeomorphism (the technical name for topological equivalence) and provide some examples.</p>
<!--more-->
<p><strong>Notation.</strong> Before we start, some notation to avoid confusion. $(a, b)$ represents an open interval and $[a, b]$ a closed one.</p>
<h2 id="metric-spaces">Metric Spaces</h2>
<p>We can see topological spaces as a generalization of metric spaces so let’s review it and introduce concepts that are needed for the generalization.</p>
<p>Recall that metric space is a vector space with a metric function [2]. We can denote a metric space by the pair $(S, d)$ where $X$ is the set of vectors (or points) and $d$ is the metric (or distance) function. An example of a metric space is $(\mathbb{R}^3, d)$ where $d$ is the Euclidean distance.</p>
<h3 id="open-ball">Open Ball</h3>
<p>Given a metric space $(X, d)$, we can define the <strong>open ball</strong> <em>about $x \in S$ of radius $\delta$</em>, where $\delta \gt 0$ and denoted by $B(x, \delta)$, as the set of points whose distance from $x$ is <em>strictly</em> less than $\delta$, that is:</p>
\[B(x, \delta) = \curly{y \in S \mid d(x, y) \lt \delta}\]
<p><em>Open</em> in this case comes from the fact that we don’t include points at the exact distance $\delta$ from $x$ (i.e. $\lt \delta$ vs. $\le \delta$) and the terminology is analogous to open intervals like $(0, 1)$ or $0 \lt x \lt 1$.</p>
<h3 id="neighborhood">Neighborhood</h3>
<p>A <strong>neighborhood</strong> of $x \in S$, denoted by $N$, is a set of points that contains at least one open ball about $x$, that is if $N$ is a neighborhood of $x$:</p>
\[\exists \delta \gt 0 : B(x, \delta) \subseteq N\]
<p>Note that neighborhood is in relation to a point $x$ but for some reason $x$ is not included in the notation $N$. When disambiguaton is needed I’ve seen it written as $N_x$.</p>
<h3 id="open-sets">Open Sets</h3>
<p>Given a metric space $(X, d)$, an <strong>open set</strong> is a subset $O$ of $X$ which is a neighborhood of all of its points, that is:</p>
\[\forall x \in O, \exists \delta \gt 0 : B(x, \delta) \subseteq O\]
<p>It’s possible to show that an open ball $B(x, \delta)$ is an open set. It seems like a weird cyclical definition since open sets are defined over neighborhoods and these over open balls.</p>
<p>Let’s consider an example for the real line. Let $x \in \mathbb{R}$ and consider the open interval $(0, 1)$. While it doesn’t include elements 0 and 1, we can always get arbitrarily close to either of them. In open ball parlance, for every $x \in (0, 1)$ we can find $\delta > 0$ such that $(x - \delta, x + \delta)$ is in $(0, 1)$. Note that $(x - \delta, x + \delta)$ is our 1D version of open ball. Thus, $(0, 1)$ is an open set.</p>
<p>However, in the semi-closed interval $[0, 1)$, none of the open balls about $x = 0$ lie inside $[0, 1)$ because it would contain the point $-\delta$, so this is not an open set.</p>
<p>Open sets are more general than open balls, in fact, it’s possible to show an open set is equivalent to the (possible infinite) union of open balls. This is a good segue for us to define four important properties of open sets.</p>
<p>Let $(S, d)$ be a metric space.</p>
<ul>
<li>The empty set is open</li>
<li>$X$ is open</li>
<li>The (possible infinite) union of open sets is open</li>
<li>The finite intersection of open sets is open</li>
</ul>
<p>We can make a few observations. First is that $X$ is always open. This means that even if $X$ is the closed interval $[0, 1)$ it is an open set, which seems to contradict the argument about $[0, 1)$ not being an open set $\in \mathbb{R}$. This is because openess is relative to the set $X$.</p>
<p>In this case, there’s no open ball about $x = 0$ in $X$ so there’s no neighborhood around $x$ to consider and thus since $[0, 1) \setminus \curly{0} = (0, 1)$ is open, $[0, 1)$ is open too.</p>
<p>The second observation is that an inifite union of open sets is open but an infinite intersection is not necessarily so. Here’s a counter example of an infinite intersection of open sets that is not open: Consider the metric space of the real line and the open sets of the form $(-\frac{1}{n}, \frac{1}{n})$, $n \in \mathbb{N}$. The infinite intersection of such open sets is:</p>
\[Z = \bigcap^{\infty}_{n = 0} (-\frac{1}{n}, \frac{1}{n})\]
<p>We claim that $Z = \curly{0}$. If not, without loss of generality, there exists $\epsilon \in Z$, $\epsilon \gt 0$. However, for a sufficiently large $n$ we have that $\frac{1}{n} < \epsilon$ so $\epsilon$ can’t be in $(-\frac{1}{n}, \frac{1}{n})$ and thus cannot be in $Z$.</p>
<p>And since $\curly{0}$ is not open in $\mathbb{R}$, we conclude our counter-example.</p>
<p>Open sets are the crux for topological spaces, as we’ll see later. First let’s work out some other concepts related to open sets.</p>
<h3 id="closed-set-limit-points-closure-border">Closed Set, Limit Points, Closure, Border</h3>
<p>Given a metric space $(X, d)$, a subset $C$ of $X$ is <strong>closed</strong> if it’s complement, $X \setminus C$, is open. It’s possible that a set is both closed and open, so it’s incorrect to define a closed set as “a set that is not open”.</p>
<p>Given a metric space $(X, d)$, the <strong>limit point</strong> of a subset $A$ is defined as the point $b \in X$ such that every neighborhood of $b$ includes some point of $A$ that is not $b$.</p>
<p>Note that $b$ need not to be in $A$. For example, if $A = (0, 1)$, then $0$ is a limit point of $A$ since every ball $B(0, \delta)$ must contain some element $\delta \in A$ while $0 \notin A$. Not all points in $A$ are limit points. For example, if $A = (0, 1) \cup \curly{2}$, element $2$ is not a limit point.</p>
<p>The union of a subset $A$ and its limit points is called a <strong>closure of</strong> $A$, denoted by $\overline{A}$. For example, if $A = (0, 1)$, $\overline{A} = [0, 1]$. Another definition for closed sets is that it’s equivalent to its closure, $A = \overline{A}$.</p>
<p>Finally the <strong>border</strong> of a set are the elements that only exist in the closure, that is $\overline{A} \setminus A$. Using again the example $A = (0, 1)$, we have that its border is $\curly{0, 1}$.</p>
<h2 id="toplogical-spaces">Toplogical Spaces</h2>
<p>A <strong>topological space</strong> consists of a set $X$ and a set $\tau$, a collection of subsets of $X$, satisfying the following properties:</p>
<ul>
<li>The empty set is in $\tau$</li>
<li>$X$ is in $\tau$</li>
<li>The (possible infinite) union of sets in $\tau$ is in $\tau$</li>
<li>The finite intersection of sets in $\tau$ is in $\tau$</li>
</ul>
<p>$\tau$ is called the <strong>topology</strong> of $X$. If we compare these properties with those of open sets in metric spaces they’re essentially the same.</p>
<p>So, given a metric space $(X, d)$, if we let $\tau$ be the collection of open sets of $X$, we get a topological space $(S, \tau)$. Thus, not coincidentally, the elements in $\tau$ are called <strong>open sets</strong>.</p>
<p>We can see that any metric space is a topological space but it’s possible to have $\tau$ satisfying the properties of open sets but that cannot be generated by any metric $d$. These are known as <strong>non-metrizable</strong> topological spaces. I don’t know anything about these, but [3] has some links.</p>
<p><strong>Example: The standard topology.</strong> This is the topology where $S = \mathbb{R}$ and $\tau$ is obtained from the open sets of the metric space $(\mathbb{R}, d)$, where $d$ is the Euclidean distance. In other words, it’s the topology of the real line and the open sets are all the open intervals and their unions.</p>
<h3 id="neighborhood-1">Neighborhood</h3>
<p>We can also define the concept of neighborhood for topological spaces. Given a topological space $(X, \tau)$, a subset $N$ of $X$ is a <strong>neighborhood of</strong> $x \in X$ if it contains at least one open set that contains $x$.</p>
<p>Note that we could have defined neighborhood in <em>metric spaces</em> in terms of open sets instead of open balls. Since open sets are unions of open balls, containing at least one open set implies containing at least one open ball and vice-versa.</p>
<h3 id="base">Base</h3>
<p>In the context of metric spaces we saw that open sets can be obtained by taking the union of open balls. In this sense the set of open balls is a smaller set of open sets that can be used to derive the “complete” open sets.</p>
<p>This idea can be generalized for topological spaces. Any subset of $\cal{B} \subseteq \tau$ such that every element in $\tau$ can be obtained by the union of elements in $\cal{B}$ is called a <strong>base</strong> for the topology $\tau$ [6].</p>
<p>Note that a base does not say anything about minimum cardinality, so $\tau$ is a base for itself too.</p>
<p><strong>Example.</strong> In topological spaces obtained from metric ones, the set of open balls form a base for the topology. In particular the set of open intervals form a base for the standard topology.</p>
<h2 id="continuous-functions">Continuous Functions</h2>
<p>Let $(X, \tau)$ and $(Y, \tau’)$ be topological spaces. We can define a function from $(X, \tau)$ to $(Y, \tau’)$ as a function from $f: X \rightarrow Y$. In general a function between topological spaces does not impose any restrictions on their topologies (i.e. $\tau$ or $\tau’$).</p>
<p>A function is <strong>continuous at a point</strong> $x \in X$ if for every neighborhood $N$ of $f(x)$ in $Y$, $f^{-1}(N)$ is a neighborhood of $x$ in $X$. A function is <strong>continuous</strong> if it’s continuous at all $x \in X$.</p>
<p>The notation $f^{-1}(N)$ represents a subset of $X$ containing all elements such that $f(x) \in N$. A more intuitive definition of continuous function is in terms of open sets:</p>
<p>A function $f:(X, \tau) \rightarrow (Y, \tau’)$ is continuous if and only if for every $O$ that is an open set in $Y$, $f^{-1}(U)$ is an open set in $X$. In other words, for every $O \in \tau’$, $f^{-1}(O) \in \tau$.</p>
<p><strong>Example.</strong> To make this definition a bit clearer, let’s look at an example of a function that is not continuous. Consider the topological space $(X, \tau)$ where $X = \mathbb{R}$ and $\tau$ is the open sets induced by the Euclidean distance. Also consider the topological space $(Y, \tau’)$ where $Y = \mathbb{Z}$ and $\tau’$ the set of all possible subsets of $\mathbb{Z}$ (also called the power set of $\mathbb{Z}$ or $\mathbb{P}(\mathbb{Z})$).</p>
<p>Now consider the function $f(x): \lfloor x \rfloor$, i.e. it truncates the decimals to obtain an integer (e.g. $\lfloor 3.14 \rfloor = 3$, $\lfloor -9.999 \rfloor = -9$).</p>
<p>Since $\curly{1} \in \tau’$ it’s open in $(Y, \tau’)$. If $f$ were to be a continuous function, $f^{-1}(\curly{1})$ must be an open set in $X$. However $f^{-1}(\curly{1})$ is $(1, 2]$, which is not an open set, so $f$ is not continuous.</p>
<h2 id="homeomorphism">Homeomorphism</h2>
<p>Notice that in continous functions we only require an open set in the image to be an open set in the domain, but not the opposite. If we require it both ways we get what is called a <strong>homeomorphism</strong>.</p>
<p>Let’s first introduce inverse functions. Consider functions $f: X \rightarrow Y$ and $g: Y \rightarrow X$. $f$ and $g$ are called <strong>inverse functions</strong> if $f \circ g$ and $g \circ f$ are the identity functions. More precisely, $\forall x \in X: g(f(x)) = x$ and $\forall y \in Y: f(g(y)) = y$.</p>
<p>Let $(X, \tau)$ and $(Y, \tau’)$ be topological spaces and let $f: (X, \tau) \rightarrow (Y, \tau’)$ and $g: (Y, \tau’) \rightarrow (X, \tau)$ continuous functions and the inverse of each other. We say that $(X, \tau)$ and $(Y, \tau’)$ are <em>homeomorphic</em> or, less formally, “topologically equivalent”.</p>
<p>An equivalent definition of homeomorphism is: let $(X, \tau)$ and $(Y, \tau’)$ be topological spaces and let $f: (X, \tau) \rightarrow (Y, \tau’)$ a <em>bijective</em> function. Then $(X, \tau)$ and $(Y, \tau’)$ are homeomorphic if and only if $\forall O \in \tau : f(O) \in \tau’$.</p>
<p>Let’s now show some examples of homeomorphic spaces.</p>
<h2 id="examples">Examples</h2>
<h3 id="example-1-the-open-intervals-0-1-and-a-b">Example 1: The open intervals (0, 1) and (a, b)</h3>
<p>It’s possible to show the open intervals of the real line $(0, 1)$ and $(a, b)$, for $a \lt b$ are homeomorphic via the function $f: (a, b) \rightarrow (0, 1)$:</p>
\[f(x) = \frac{x - a}{b - a}\]
<p>Whose inverse is:</p>
\[f^{-1}(x) = (b - a)x + a\]
<p>So it’s bijective. We just need to prove that for any open set $x$ in $(a, b)$, $f(x)$ is an open set in $(0, 1)$ in vice-versa.</p>
<p>We can actually work with open <em>intervals</em> instead of open sets. To see why, any open set $O$ in $(a, b)$ is a union of open intervals. If we show that every open interval in $(a, b)$ maps to an open interval in $(0, 1)$, then for each open interval composing $O$ we’ll obtain a corresponding open interval in $(0, 1)$ which we can union to obtain an open set [4].</p>
<p>We first show that both $f(x)$ and $f^{-1}(x)$ are monotonically increasing. Consider $s$ and $t$ such that $a \lt s \lt t \lt b$. One way to do this is to prove that $f(x + \epsilon) \gt f(x)$ for any $\epsilon \gt 0$.</p>
<p>We start with an unknown relation $\sim$:</p>
\[\frac{x - a}{b - a} \sim \frac{x + \epsilon - a}{b - a}\]
<p>Since $b -a \gt 0$, we can simplify this to:</p>
\[x \sim x + \epsilon\]
<p>Since $\epsilon \gt 0$, we conclude $\sim$ is $\lt$ and that $f(x)$ is monotonically increasing. The same idea can be used for $f^{-1}(x)$.</p>
<p>To show that $(f(s), f(t))$ is an interval in $(0, 1)$, we need to show that a given $x$ satisfying $f(s) \lt x \lt f(t)$ belongs to $(f(s), f(t))$. Because $f^{-1}$ is monotonically increasing, $s \lt f^{-1}(x) \lt t$ and belongs to $(s, t)$ by definition. Thus there is $y \in (s, t)$ such that $f(y) = x$ for all $f(s) \lt x \lt f(t)$. To show $(f(s), f(t))$ is <em>open</em>, it suffices to note that neither $s$ nor $t$ belongs to the open interval $(s, t)$ in $(a, b)$.</p>
<p>A similar argument can be applied in the other direction to show $(f^{-1}(s), f^{-1}(t))$ is an open interval in $(a, b)$ for any open interval $(s, t)$ in $(0, 1)$.</p>
<p>Alternatively, we can observe that both $f(x)$ and $f^{-1}(x)$ are linear functions of the form $\alpha x + \beta$, which can be shown to be continuous and that would suffice to show $f$ is a homeomorphism.</p>
<h3 id="example-2-the-open-intervals--1-1-and-mathbbr">Example 2: The open intervals (-1, 1) and $\mathbb{R}$</h3>
<p>It’s possible to generalize further and show that $(-1, 1)$ and $\mathbb{R}$ are homeomorphic. We can use the function $f: \mathbb{R} \rightarrow (-1, 1)$:</p>
\[f(x) = \frac{x}{1 + \abs{x}}\]
<p>Whose inverse $f^{-1}(x): (-1, 1) \rightarrow \mathbb{R}$:</p>
\[f^{-1}(x) = \frac{x}{1 - \abs{x}}\]
<p>We can show both functions are monotonically increasing. It’s a bit trickier because we need to consider the cases for $x \ge 0$ and $x \lt 0$ to get rid of the $\abs{x}$ term, but once we do that, we use the same arguments from <em>Example 1</em> to show the 1-to-1 mapping between open intervals.</p>
<p>Since homeomorphism is an equivalence relation and $(0, 1)$ is homeomorphic to $(-1, 1)$ (using the result from <em>Example 1</em> and $a = -1, b = 1$), we conclude $(0, 1)$ is homeomorphic to $\mathbb{R}$.</p>
<h3 id="example-3-the-n-dimensional-open-ball-and-mathbbrn">Example 3: The $n$-dimensional open ball and $\mathbb{R}^n$</h3>
<p>We can show that the unit open ball centered at the origin, denoted by $B$, is homeomorphic to
$\mathbb{R}^n$ via $f:\mathbb{R}^n \rightarrow B$ [5]:</p>
\[f(x) = \frac{x}{1 + \norm{x}}\]
<p>Whose inverse $f:B \rightarrow \mathbb{R}^n$:</p>
\[f^{-1}(x) = \frac{x}{1 - \norm{x}}\]
<p>Note how these are essentially the same functions we used in <em>Example 2</em> but in higher dimension. In the one dimensional case we worked with open intervals, which are a base for the topology, instead of open sets.</p>
<p>For this example we’ll go further. Instead of working with open balls which are the higher dimension version of the interval, we’ll define a finer base. To start, we can categorize the set of open balls into two types: those with the center at the origin:</p>
\[\norm{x} \lt r\]
<p>And those with the center at some point $o$:</p>
\[\norm{x - o} \lt r\]
<p>For example, in 2D, we have the open circle $x^2 + y^2 \lt 0.5$ centered at the origin and $(x - 0.1)^2 + (y = 0.2)^2 \lt 0.3$ centered at point $(0.1, 0.2)$.</p>
<p>Consider now a circumference at the origin, i.e. the set of points satisfying:</p>
\[C_r = \curly{x : \norm{x} = r}\]
<p>If we consider the open arcs of this circumference they’re not open sets in $\mathbb{R}^2$ because, being one dimensional, it’s not possible to find an open ball around points in this circumference. However if we add some “width”, they’ll become open sets:</p>
\[\curly{k x : x \in C_r, a \lt k \lt b}\]
<p>The base we’ll define is composed of the open balls at the origin plus these thick open arcs. We claim that open balls that are not centered at the origin are the union of the thick open arcs above. We won’t provide a proof but we can visualize it in 2D in <em>Figure 2</em> to get an intuition.</p>
<figure class="center_children">
<img src="https://www.kuniga.me/resources/blog/2022-11-03-topological-equivalence/coverage1.png" alt="See caption." />
<figcaption>Figure 2: Example in 2 dimensions depicting how a circle (green) can be obtained as the union of open arcs from circumferences at the origin. Note that because the circle is open, any non-empty intersection with a circumference will have at least 2 points, which is needed for the arc to be open.</figcaption>
</figure>
<p>So with this base we now need to show that each of these elements (which are open sets) map to open sets when transformed via $f$ and conversely via $f^{-1}$.</p>
<p>For the open balls at the origin with radius $r$, we claim that they’ll continue to be open balls. To see why, consider the subset of points on an open ball at a fixed radius $r’ \lt r$. Since all these points have the same norm $r’$, the functions are just going to be a multiplication by a scalar $\alpha$, and we can use much the same ideas from <em>Example 2</em> to see it gets transformed into an open ball with radius $r \alpha$.</p>
<p>For the open arcs of circumferences, we note that since all the points are at distance $r$ from the origin, they have the same norm and thus end up being transformed into a circumference of a different radius like above, and each open arc is still an open arc in this new circumference. QED.</p>
<p>We can show that the unit open ball is homemorphic to open balls of any size using a similar idea. At first glance it seems like using this idea we can prove any $n$-d “shape” is homeomorphic to $\mathbb{R}^n$ but we run into some trouble.</p>
<p>Consider an example in 2 dimensions. Suppose that instead of a circle we had a square centered at the origin. When mapping open balls from $\mathbb{R}^2$ into the square, to make sure they are inside, we have to choose a radius $r$ such that the corresponding open circle is inscribed in the square. But then there are parts of the square that would be unreacheable via this mapping, meaning that such mapping would not be bijective.</p>
<p>We’ll see next that the open square is also homeomorphic to $\mathbb{R}^2$ or more generally a bounded polytope is homeomorphic to $\mathbb{R}^n$.</p>
<h3 id="example-4-the-n-dimensional-bounded-open-convex-polytope-and-open-balls">Example 4: The $n$-dimensional bounded open convex polytope and open balls</h3>
<p>A $n$-dimensional polytope is a general version of a polygon. A convex polytope is one that can be formed by the intersection of semi-planes, which can be succintly represents as:</p>
\[Ax \le b\]
<p>This looks like the constraints of a linear programming model (LP). In fact a convex polytope is the feasible region of a LP. If we restrict the constraints to strict inequality:</p>
\[Ax \lt b\]
<p>We get an open convex polytope. Note that the feasible region of a LP can be unbounded, for example one composed of a single constraint. If we restrict the cases where the area (or the corresponding measure for $n$ dimensions) to be finite, we have a <em>bounded open convex polytope</em>. As an example, a square without its borders is a bounded open convex polytope in 2D.</p>
<p>We shall now prove that a bounded open convex polytope $P$ that contains the origin is homeomorphic to an open ball centered at the origin.</p>
<p>Let’s consider the set of <em>rays</em> emanating from the origin. More formally, consider the set corresponding to the circumference of radius 1, $C = \curly{x \in \mathbb{R}^n$ \mid \norm{x} = 1}$. A ray of $c \in C$, denoted by $R_c$, is the set of points in the line that starts at the origin and passes by $c$, that is, the set $\curly{c\lambda, \lambda \gt 0}$ (note the rays don’t include the origin).</p>
<p>Let $P_B$ be the border (see precise definition in <em>Closed Set, Limit Points, Closure, Border</em>) of $P$. Because $P$ is convex, bounded and contains the origin, every ray $R_c$ is incident to exactly one point $v \in P_B$. For each $x \in R_c$, let $d(x) = \norm{v}$.</p>
<p>Let $B$ be the unit open ball at the origin. We can now define our homeomorphism. Let $f: B \rightarrow P$ be defined as:</p>
\[f(x) = x d(x)\]
<p>If $x = 0$, define $f(0) = 0$. Conversely, the inverse function $f^{-1}: P \rightarrow B$ is:</p>
\[f^{-1}(v) = \frac{v}{d(v)}\]
<p>To show open sets correspond to open sets when transformed by either function, we’ll define a new base as we did in <em>Example 3</em>. We’ll divide our base into two sets.</p>
<p>For an open ball that does not contain the origin, it’s the union of the open segments of $R_c$ for all $c \in C$, corresponding to the intersection of the ray and the open ball, which is an open segment, as depicted in <em>Figure 3</em> for 2 dimensions. As in <em>Example 3</em>, a single segment is one dimensional and not an open set. We can add some width to them by bundling adjacent segments together and add them to our base.</p>
<figure class="center_children">
<img src="https://www.kuniga.me/resources/blog/2022-11-03-topological-equivalence/coverage2.png" alt="See caption." />
<figcaption>Figure 3: Example in 2 dimensions depicting how a circle (green) can be obtained as the union of open segments from rays starting at the origin. Note that because the circle is open, any non-empty intersection with a ray will have at least 2 points, which is needed for the segment to be open.</figcaption>
</figure>
<p>For an open ball that <em>contains</em> the origin, we cannot cover with it only with rays because rays do not contain the origin. However, we if can include open balls centered at the origin it can be unioned with the thick rays to “plug” the hole at the origin.</p>
<p>For a given ray $R_c$, applying it over $f$ or $f^{-1}$ corresponds to scaling it by a fixed constant $d(c)$, so the result is an open segment, reasoning much like we did for open intervals in <em>Example 1</em>.</p>
<p>For a ball at the origin with radius $r$, when we apply $f$ to it, we’ll obtain the open convex polytope $P$ scaled by $r$. If we apply $f^{-1}$ to it, we’ll have an open bounded polytope that is however not convex, but is nevertheless open. QED.</p>
<p>In [7] Stefan Geschke proves the more general case where the open convex polytope need not be bounded.</p>
<p>It should be easy to find a homeomorphism between a polytope and its translatation by a fixed amount, so the restriction we started with of a polytope having to contain the origin is not a problem.</p>
<p>We can conclude that all open convex bounded polytopes are homeomorphic to the unit open ball and from <em>Example 3</em>, to $\mathbb{R}^n$.</p>
<h3 id="patterns-in-proving-homeomorphisms">Patterns in proving homeomorphisms</h3>
<p>To prove the homeomorphism between 2 topological spaces $(X, \tau)$ and $(Y, \tau’)$, first we need to find a bijective function between $X$ and $Y$.</p>
<p>Then we try to find a suitable base whose elements can be transformed more conveniently. For <em>Example3</em>, since the function is based on the norm $\norm{x}$, it’s convenient to have in the base elements that have constant $\norm{x}$, for example arcs in a circumference at the origin.</p>
<p>Similarly in <em>Example 4</em>, the function depends on the function $d$ which is constant for points on the same ray $R_c$, so using elements along rays is convenient.</p>
<h2 id="conclusion">Conclusion</h2>
<p>In this post we attempted to provide some formalism around the concept of topological equivalence. Trying to understand the homeomorphism between specific surfaces in 1D and 2D was rewarding and am very glad to sites like <a href="https://math.stackexchange.com/">Mathematics Stack Exchange</a> [3, 4, 5, 7] since these are not covered in the book I used as reference [1].</p>
<p>After studying the proofs I have a much better grasp on what it means when we say that topology is preserved under not only under rotation, translation but also stretching.</p>
<p>I still don’t know how to prove that a mug and a doughnut are topologically equivalent but I’ll leave it for another time.</p>
<p>Since topology has a lot of concepts and formal definitions, I also wrote a <a href="Jekyll::Drops::SiteDrop/docs/math/topology.html">cheat sheet</a> for reference.</p>
<h2 id="related-posts">Related Posts</h2>
<ul>
<li><a href="https://www.kuniga.me/blog/2013/11/11/lawler-and-an-introduction-to-matroids.html">An Introduction to Matroids</a> - There seems to be an apparent parallel between topological spaces $(X, \tau)$ and matroids $(E, \cal I)$. Both $X$ and $E$ are sets and both $\tau$ and $\cal I$ are collections of subsets satisfying some property. I wonder if there’s any more to them.</li>
</ul>
<h2 id="references">References</h2>
<ul>
<li>[1] Introduction to Topology, Bert Mendelson</li>
<li>[<a href="https://www.kuniga.me/blog/2021/06/26/hilbert-spaces.html">2</a>] NP-Incompleteness: Hilbert Spaces</li>
<li>[<a href="https://math.stackexchange.com/a/888120/42012">3</a>] Mathematics Stack Exchange: Prove that some topology is not metrizable, Tomasz Kania.</li>
<li>[<a href="https://math.stackexchange.com/a/1929892/42012">4</a>] Mathematics Stack Exchange: Prove that $(a, b)$ is homeomorphic to $(0,1)$, KonKan</li>
<li>[<a href="https://math.stackexchange.com/a/1072818/42012">5</a>] Mathematics Stack Exchange: Is an open $n$-ball homeomorphic to $\mathbb{R}^n$? user149792</li>
<li>[<a href="https://en.wikipedia.org/wiki/Base_(topology)">6</a>] Wikipedia: Base (Topology)</li>
<li>[<a href="https://math.stackexchange.com/a/165644">7</a>] Mathematics Stack Exchange: Proof that convex open sets in $\mathbb{R}^n$ are homeomorphic? Stefan Geschke</li>
</ul>Guilherme KunigamiI’ve recently read the book Introduction to Topology by Bert Mendelson [1] and before that my only knowledge of topology is that some objects are topologically equivalent, for example a mug and a doughnut. After reading the book, I found topology is a lot more about algebraic formalism than visual geometry. So in this post I’d like to discuss the idea of topological equivalence from this formal perspective but using visual examples for better intuition. Figure 1: Left: A mug from the Museum of Mathematics in New York. Right: A doughnut generated using Stable Diffusion. We’ll review metric spaces, then generalize to topological spaces, introduce a formal definition for continuous functions and then explore homeomorphism (the technical name for topological equivalence) and provide some examples.Review: Effecive Modern C++2022-10-25T00:00:00+00:002022-10-25T00:00:00+00:00https://www.kuniga.me/blog/2022/10/25/review-effective-modern-cpp<!-- This needs to be define as included html because variables are not inherited by Jekyll pages -->
<figure class="image_float_left">
<img src="https://www.kuniga.me/resources/blog/2022-10-25-review-effective-modern-cpp/book_cover.jpg" alt="Streaming Systems book cover" />
</figure>
<p>In this post I’ll share my notes on the book <em>Effective Modern C++</em> by Scott Meyers.</p>
<p>As with <em>Effective C++</em>, Meyers’ book is organized around items. Each item title describes specific recommendations (e.g. “Prefer nullptr to 0 and NULL”) and then it delves into the rationale, while also explaning details about the C++ language.</p>
<p>The post lists each item with a summary and my thoughts when applicable. The goal is that it can be an index to find more details on the book itself.</p>
<p>The <em>Modern</em> in the title refers to C++11 and C++14 features. This book is a complement to <em>Effective C++</em>, not an updated edition.</p>
<!--more-->
<h2 class="no_toc" id="organization-of-the-book">Organization of the book</h2>
<p>The book is divided into 8 chapters and 42 items. Each chapter serves as a theme into which the items are organized.</p>
<p>To make look up easier, I’ve included a table of contents:</p>
<ol id="markdown-toc">
<li><a href="#chapter-1---deducing-types" id="markdown-toc-chapter-1---deducing-types">Chapter 1 - Deducing Types</a> <ol>
<li><a href="#item-1-understand-template-type-deduction" id="markdown-toc-item-1-understand-template-type-deduction">Item 1: Understand template type deduction</a></li>
<li><a href="#item-2-understand-auto-type-deduction" id="markdown-toc-item-2-understand-auto-type-deduction">Item 2: Understand auto type deduction</a></li>
<li><a href="#item-3-understand-decltype" id="markdown-toc-item-3-understand-decltype">Item 3: Understand decltype</a></li>
<li><a href="#item-4-know-how-to-view-deduced-types" id="markdown-toc-item-4-know-how-to-view-deduced-types">Item 4: Know how to view deduced types</a></li>
</ol>
</li>
<li><a href="#chapter-2---auto" id="markdown-toc-chapter-2---auto">Chapter 2 - auto</a> <ol>
<li><a href="#item-5-prefer-auto-to-explicit-parameters" id="markdown-toc-item-5-prefer-auto-to-explicit-parameters">Item 5: Prefer auto to explicit parameters</a></li>
<li><a href="#item-6-use-the-explicitly-typed-initializer-idiom-when-auto-deduces-undesired-types" id="markdown-toc-item-6-use-the-explicitly-typed-initializer-idiom-when-auto-deduces-undesired-types">Item 6: Use the explicitly typed initializer idiom when auto deduces undesired types</a></li>
</ol>
</li>
<li><a href="#chapter-3---moving-to-modern-c" id="markdown-toc-chapter-3---moving-to-modern-c">Chapter 3 - Moving to Modern C++</a> <ol>
<li><a href="#item-7---distinguish-between--and--when-creating-objects" id="markdown-toc-item-7---distinguish-between--and--when-creating-objects">Item 7 - Distinguish between () and {} when creating objects</a></li>
<li><a href="#item-8---prefer-nullptr-to-0-and-null" id="markdown-toc-item-8---prefer-nullptr-to-0-and-null">Item 8 - Prefer nullptr to 0 and NULL</a></li>
<li><a href="#item-9---prefer-alias-declarations-over-typedefs" id="markdown-toc-item-9---prefer-alias-declarations-over-typedefs">Item 9 - Prefer alias declarations over typedefs</a></li>
<li><a href="#item-10---prefer-scoped-enums-to-unscoped-enums" id="markdown-toc-item-10---prefer-scoped-enums-to-unscoped-enums">Item 10 - Prefer scoped enums to unscoped enums</a></li>
<li><a href="#item-11---prefer-deleted-functions-to-private-undefined-ones" id="markdown-toc-item-11---prefer-deleted-functions-to-private-undefined-ones">Item 11 - Prefer deleted functions to private undefined ones</a></li>
<li><a href="#item-12---declare-overriding-functions-override" id="markdown-toc-item-12---declare-overriding-functions-override">Item 12 - Declare overriding functions override</a></li>
<li><a href="#item-13---prefer-const-iterators" id="markdown-toc-item-13---prefer-const-iterators">Item 13 - Prefer const iterators</a></li>
<li><a href="#item-14---declare-functions-noexcept-when-possible" id="markdown-toc-item-14---declare-functions-noexcept-when-possible">Item 14 - Declare functions noexcept when possible</a></li>
<li><a href="#item-15---use-constexpr-wherever-possible" id="markdown-toc-item-15---use-constexpr-wherever-possible">Item 15 - Use constexpr wherever possible</a></li>
<li><a href="#item-16---make-const-members-thread-safe" id="markdown-toc-item-16---make-const-members-thread-safe">Item 16 - Make const members thread safe</a></li>
<li><a href="#item-17---understand-special-member-function-generation" id="markdown-toc-item-17---understand-special-member-function-generation">Item 17 - Understand special member function generation</a></li>
</ol>
</li>
<li><a href="#chapter-4-smart-pointers" id="markdown-toc-chapter-4-smart-pointers">Chapter 4: Smart Pointers</a> <ol>
<li><a href="#item-18---use-stdunique_ptr-for-exclusive-ownership-resource-management" id="markdown-toc-item-18---use-stdunique_ptr-for-exclusive-ownership-resource-management">Item 18 - Use std::unique_ptr for exclusive-ownership resource management</a></li>
<li><a href="#item-19---use-stdshared_ptr-for-shared-ownership-resource-management" id="markdown-toc-item-19---use-stdshared_ptr-for-shared-ownership-resource-management">Item 19 - Use std::shared_ptr for shared-ownership resource management</a></li>
<li><a href="#item-20---use-stdweak_ptr-for-stdshared_ptr-like-pointer-that-can-dangle" id="markdown-toc-item-20---use-stdweak_ptr-for-stdshared_ptr-like-pointer-that-can-dangle">Item 20 - Use std::weak_ptr for std::shared_ptr like pointer that can dangle</a></li>
<li><a href="#item-21---prefer-stdmake_unique-and-stdmake_shared-to-direct-use-of-new" id="markdown-toc-item-21---prefer-stdmake_unique-and-stdmake_shared-to-direct-use-of-new">Item 21 - Prefer std::make_unique and std::make_shared to direct use of new</a></li>
<li><a href="#item-22---when-using-the-pimpl-idiom-define-special-member-functions-in-the-implementation-file" id="markdown-toc-item-22---when-using-the-pimpl-idiom-define-special-member-functions-in-the-implementation-file">Item 22 - When using the Pimpl idiom, define special member functions in the implementation file</a></li>
</ol>
</li>
<li><a href="#chapter-5-rvalue-references-move-semantics-and-perfect-forwarding" id="markdown-toc-chapter-5-rvalue-references-move-semantics-and-perfect-forwarding">Chapter 5: RValue References, Move Semantics and Perfect Forwarding</a> <ol>
<li><a href="#item-23---understand-stdmove-and-stdforward" id="markdown-toc-item-23---understand-stdmove-and-stdforward">Item 23 - Understand std::move and std::forward</a></li>
<li><a href="#item-24---distinguish-universal-references-from-rvalue-references" id="markdown-toc-item-24---distinguish-universal-references-from-rvalue-references">Item 24 - Distinguish universal references from rvalue references</a></li>
<li><a href="#item-25---use-stdmove-on-rvalue-references-stdforward-on-universal-references" id="markdown-toc-item-25---use-stdmove-on-rvalue-references-stdforward-on-universal-references">Item 25 - Use std::move on rvalue references, std::forward on universal references</a></li>
<li><a href="#item-26---avoid-overloading-on-universal-references" id="markdown-toc-item-26---avoid-overloading-on-universal-references">Item 26 - Avoid overloading on universal references</a></li>
<li><a href="#item-27---alternatives-to-overloading-universal-references" id="markdown-toc-item-27---alternatives-to-overloading-universal-references">Item 27 - Alternatives to overloading universal references</a></li>
<li><a href="#item-28---understand-reference-collapsing" id="markdown-toc-item-28---understand-reference-collapsing">Item 28 - Understand reference collapsing</a></li>
<li><a href="#item-29---assume-move-operations-are-not-present-not-cheap-and-not-used" id="markdown-toc-item-29---assume-move-operations-are-not-present-not-cheap-and-not-used">Item 29 - Assume move operations are not present, not cheap, and not used</a></li>
<li><a href="#item-30---familiarize-yourself-with-perfect-forwarding-failures" id="markdown-toc-item-30---familiarize-yourself-with-perfect-forwarding-failures">Item 30 - Familiarize yourself with perfect forwarding failures</a></li>
</ol>
</li>
<li><a href="#chapter-6---lambda-expressions" id="markdown-toc-chapter-6---lambda-expressions">Chapter 6 - Lambda Expressions</a> <ol>
<li><a href="#item-31---avoid-default-capture-modes" id="markdown-toc-item-31---avoid-default-capture-modes">Item 31 - Avoid default capture modes</a></li>
<li><a href="#item-32---use-init-capture-to-move-objects-into-closure" id="markdown-toc-item-32---use-init-capture-to-move-objects-into-closure">Item 32 - Use init capture to move objects into closure</a></li>
<li><a href="#item-33---use-decltype-on-auto-to-stdforward-them" id="markdown-toc-item-33---use-decltype-on-auto-to-stdforward-them">Item 33 - Use decltype on auto&& to std::forward them</a></li>
<li><a href="#item-34---prefer-lambdas-over-stdbind" id="markdown-toc-item-34---prefer-lambdas-over-stdbind">Item 34 - Prefer lambdas over std::bind</a></li>
</ol>
</li>
<li><a href="#chapter-7---the-concurrency-api" id="markdown-toc-chapter-7---the-concurrency-api">Chapter 7 - The Concurrency API</a> <ol>
<li><a href="#item-35---prefer-task-based-programming-to-thread-based" id="markdown-toc-item-35---prefer-task-based-programming-to-thread-based">Item 35 - Prefer task-based programming to thread-based.</a></li>
<li><a href="#item-36---specify-stdlaunchasync-if-asynchronicity-is-essential" id="markdown-toc-item-36---specify-stdlaunchasync-if-asynchronicity-is-essential">Item 36 - Specify std::launch::async if asynchronicity is essential</a></li>
<li><a href="#item-37---make-stdthread-unjoinable-on-all-paths" id="markdown-toc-item-37---make-stdthread-unjoinable-on-all-paths">Item 37 - Make std::thread unjoinable on all paths</a></li>
<li><a href="#item-38---be-aware-of-varying-thread-handle-destructor-behavior" id="markdown-toc-item-38---be-aware-of-varying-thread-handle-destructor-behavior">Item 38 - Be aware of varying thread handle destructor behavior</a></li>
<li><a href="#item-39---consider-void-futures-for-one-shot-event-communication" id="markdown-toc-item-39---consider-void-futures-for-one-shot-event-communication">Item 39 - Consider void futures for one-shot event communication</a></li>
<li><a href="#item-40---use-stdatomic-for-concurrency-volatile-for-special-memory" id="markdown-toc-item-40---use-stdatomic-for-concurrency-volatile-for-special-memory">Item 40 - Use std::atomic for concurrency, volatile for special memory</a></li>
</ol>
</li>
<li><a href="#chapter-8---tweaks" id="markdown-toc-chapter-8---tweaks">Chapter 8 - Tweaks</a> <ol>
<li><a href="#item-41---consider-pass-by-value-for-copyable-parameters-that-are-cheap-to-move-and-always-copied" id="markdown-toc-item-41---consider-pass-by-value-for-copyable-parameters-that-are-cheap-to-move-and-always-copied">Item 41 - Consider pass by value for copyable parameters that are cheap to move and always copied</a></li>
<li><a href="#item-42---consider-emplacement-instead-of-insertion" id="markdown-toc-item-42---consider-emplacement-instead-of-insertion">Item 42 - Consider emplacement instead of insertion</a></li>
</ol>
</li>
</ol>
<h2 id="chapter-1---deducing-types">Chapter 1 - Deducing Types</h2>
<h3 id="item-1-understand-template-type-deduction">Item 1: Understand template type deduction</h3>
<p>This item explains how a given template <code class="language-plaintext highlighter-rouge">T</code> is resolved. Let’s analyze some cases. First assume the template is declared as <code class="language-plaintext highlighter-rouge">T&</code>:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="k">template</span><span class="o"><</span><span class="k">typename</span> <span class="nc">T</span><span class="p">></span>
<span class="kt">void</span> <span class="nf">f</span><span class="p">(</span><span class="n">T</span><span class="o">&</span> <span class="n">x</span><span class="p">)</span> <span class="p">{</span> <span class="p">}</span>
<span class="k">const</span> <span class="kt">int</span> <span class="n">a</span> <span class="o">=</span> <span class="mi">1</span><span class="p">;</span>
<span class="n">f</span><span class="p">(</span><span class="n">a</span><span class="p">);</span> <span class="c1">// T is const int</span>
<span class="k">const</span> <span class="kt">int</span><span class="o">&</span> <span class="n">b</span> <span class="o">=</span> <span class="n">a</span><span class="p">;</span>
<span class="n">f</span><span class="p">(</span><span class="n">b</span><span class="p">);</span> <span class="c1">// T is const it</span></code></pre></figure>
<p>As <code class="language-plaintext highlighter-rouge">const T&</code>:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="k">template</span><span class="o"><</span><span class="k">typename</span> <span class="nc">T</span><span class="p">></span>
<span class="kt">void</span> <span class="nf">f</span><span class="p">(</span><span class="k">const</span> <span class="n">T</span><span class="o">&</span> <span class="n">x</span><span class="p">)</span> <span class="p">{</span> <span class="p">}</span>
<span class="k">const</span> <span class="kt">int</span> <span class="n">a</span> <span class="o">=</span> <span class="mi">1</span><span class="p">;</span>
<span class="n">f</span><span class="p">(</span><span class="n">a</span><span class="p">);</span> <span class="c1">// T is int</span>
<span class="k">const</span> <span class="kt">int</span><span class="o">&</span> <span class="n">b</span> <span class="o">=</span> <span class="n">a</span><span class="p">;</span>
<span class="n">f</span><span class="p">(</span><span class="n">b</span><span class="p">);</span> <span class="c1">// T is it</span></code></pre></figure>
<p>As <code class="language-plaintext highlighter-rouge">T&&</code> (universal reference, see <em>Item 24</em>):</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="k">template</span><span class="o"><</span><span class="k">typename</span> <span class="nc">T</span><span class="p">></span>
<span class="kt">void</span> <span class="nf">f</span><span class="p">(</span><span class="k">const</span> <span class="n">T</span><span class="o">&&</span> <span class="n">x</span><span class="p">)</span> <span class="p">{</span> <span class="p">}</span>
<span class="k">const</span> <span class="kt">int</span> <span class="n">a</span> <span class="o">=</span> <span class="mi">1</span><span class="p">;</span>
<span class="n">f</span><span class="p">(</span><span class="n">a</span><span class="p">);</span> <span class="c1">// T is int&</span>
<span class="n">f</span><span class="p">(</span><span class="mi">1</span><span class="p">);</span> <span class="c1">// 1 is rvalue, T is int&&</span></code></pre></figure>
<p>There are corner cases for arrays or function pointers which we won’t cover.</p>
<h3 id="item-2-understand-auto-type-deduction">Item 2: Understand auto type deduction</h3>
<p>The gist is that <code class="language-plaintext highlighter-rouge">auto</code> is resolved the same way templates are for the most part. The only difference is when an <code class="language-plaintext highlighter-rouge">auto</code> variable is initialized using curly braces. <code class="language-plaintext highlighter-rouge">auto</code> resolves to <code class="language-plaintext highlighter-rouge">std::initializer_list<></code>, template doesn’t compile:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="c1">// std::initializer_list<int></span>
<span class="k">auto</span> <span class="n">x</span> <span class="o">=</span> <span class="p">{</span> <span class="mi">10</span> <span class="p">};</span>
<span class="c1">// couldn't infer template argument 'T'</span>
<span class="n">f</span><span class="p">({</span><span class="mi">10</span><span class="p">});</span></code></pre></figure>
<h3 id="item-3-understand-decltype">Item 3: Understand decltype</h3>
<p><code class="language-plaintext highlighter-rouge">decltype</code> is used to get the type of a variable. It can be useful to bridge the gap between <code class="language-plaintext highlighter-rouge">auto</code> and templates. <code class="language-plaintext highlighter-rouge">auto</code> doesn’t have an explicit type and templates might require one. <em>Item 18</em> provides one such example:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="k">auto</span> <span class="n">deleter</span> <span class="o">=</span> <span class="p">[](</span><span class="n">C</span><span class="o">*</span> <span class="n">ptr</span><span class="p">)</span> <span class="p">{</span>
<span class="k">delete</span> <span class="n">ptr</span><span class="p">;</span>
<span class="p">};</span>
<span class="n">std</span><span class="o">::</span><span class="n">unique_ptr</span><span class="o"><</span><span class="n">C</span><span class="p">,</span> <span class="k">decltype</span><span class="p">(</span><span class="n">deleter</span><span class="p">)</span><span class="o">></span> <span class="n">uPtr</span><span class="p">(</span><span class="nb">nullptr</span><span class="p">,</span> <span class="n">deleter</span><span class="p">);</span></code></pre></figure>
<p>We won’t go over the details, but type-wise, this version of <code class="language-plaintext highlighter-rouge">std::unique_ptr<T, D></code> has two template parameters, the type of the underlying object <code class="language-plaintext highlighter-rouge">T</code> and that of the deleter function <code class="language-plaintext highlighter-rouge">D</code>. We don’t know the type of <code class="language-plaintext highlighter-rouge">deleter</code> so we can use <code class="language-plaintext highlighter-rouge">decltype(deleter)</code>. <em>Item 33</em> has a simular use case.</p>
<p>Another use of <code class="language-plaintext highlighter-rouge">decltype</code> is combining with <code class="language-plaintext highlighter-rouge">auto</code> as <code class="language-plaintext highlighter-rouge">decltype(auto)</code>. One problem with <code class="language-plaintext highlighter-rouge">auto</code> is that it drops the reference modifier when resolving. For example:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="k">struct</span> <span class="nc">C</span> <span class="p">{</span>
<span class="kt">int</span><span class="o">&</span> <span class="n">getRef</span><span class="p">()</span> <span class="p">{</span> <span class="k">return</span> <span class="n">x_</span><span class="p">;</span> <span class="p">}</span>
<span class="kt">int</span> <span class="n">x_</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">auto</span> <span class="nf">f</span><span class="p">(</span><span class="n">C</span> <span class="n">c</span><span class="p">)</span> <span class="p">{</span> <span class="c1">// int f(C c)</span>
<span class="k">return</span> <span class="n">c</span><span class="p">.</span><span class="n">getRef</span><span class="p">();</span>
<span class="p">}</span></code></pre></figure>
<p>Here <code class="language-plaintext highlighter-rouge">auto</code> resolves to <code class="language-plaintext highlighter-rouge">int</code>. If we wish to preserve the <code class="language-plaintext highlighter-rouge">&</code> from <code class="language-plaintext highlighter-rouge">getRef()</code> we need <code class="language-plaintext highlighter-rouge">decltype</code>:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="k">decltype</span><span class="p">(</span><span class="k">auto</span><span class="p">)</span> <span class="n">f</span><span class="p">(</span><span class="n">C</span> <span class="n">c</span><span class="p">)</span> <span class="p">{</span> <span class="c1">// int& f(C c)</span>
<span class="k">return</span> <span class="n">c</span><span class="p">.</span><span class="n">getRef</span><span class="p">();</span>
<span class="p">}</span></code></pre></figure>
<p>A third use of <code class="language-plaintext highlighter-rouge">decltype</code> is a technique to display the type of a given variable at compile time, as shown in <em>Item 4</em>.</p>
<h3 id="item-4-know-how-to-view-deduced-types">Item 4: Know how to view deduced types</h3>
<p>There are several ways to inspect the deduced type of variables declared with auto:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="k">auto</span> <span class="n">x</span> <span class="o">=</span> <span class="cm">/* expr */</span></code></pre></figure>
<p>One interesting technique is to have a compilation error tell us that. We can use the following code:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="k">template</span><span class="o"><</span><span class="k">typename</span> <span class="nc">T</span><span class="p">></span>
<span class="k">class</span> <span class="nc">TD</span><span class="p">;</span>
<span class="n">TD</span><span class="o"><</span><span class="k">decltype</span><span class="p">(</span><span class="n">x</span><span class="p">)</span><span class="o">></span> <span class="n">xType</span><span class="p">;</span></code></pre></figure>
<p>This will fail to compile and display the type in the error message. In <code class="language-plaintext highlighter-rouge">clang</code> (v14) I get:</p>
<blockquote>
<p>error: implicit instantiation of undefined template ‘TD<int *>’</p>
</blockquote>
<p>In runtime, we can instead use <code class="language-plaintext highlighter-rouge">typeid().name()</code>:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="n">std</span><span class="o">::</span><span class="n">cout</span> <span class="o"><<</span> <span class="k">typeid</span><span class="p">(</span><span class="n">x</span><span class="p">).</span><span class="n">name</span><span class="p">()</span> <span class="o"><<</span> <span class="n">std</span><span class="o">::</span><span class="n">endl</span><span class="p">;</span></code></pre></figure>
<p>It mangles the name but compilers have tools for prettifying it. For <code class="language-plaintext highlighter-rouge">clang</code>, we can use the <code class="language-plaintext highlighter-rouge">llvm-cxxfilt</code> CLI:</p>
<figure class="highlight"><pre><code class="language-text" data-lang="text">llvm-cxxfilt --types Pi</code></pre></figure>
<h2 id="chapter-2---auto">Chapter 2 - auto</h2>
<h3 id="item-5-prefer-auto-to-explicit-parameters">Item 5: Prefer auto to explicit parameters</h3>
<p>The reasons provided include: easier to type and refactor. It also avoids subtle type mismatches which are hard to catch because the compiler tries to convert/cast types when possible. Examples are provided in the book.</p>
<p><em>Item 6</em> discusses cases in which <code class="language-plaintext highlighter-rouge">auto</code> doesn’t work well.</p>
<h3 id="item-6-use-the-explicitly-typed-initializer-idiom-when-auto-deduces-undesired-types">Item 6: Use the explicitly typed initializer idiom when auto deduces undesired types</h3>
<p>One example where <code class="language-plaintext highlighter-rouge">auto</code> doesn’t infer the “expected” type is when accessing an element of a vector of booleans. This is because <code class="language-plaintext highlighter-rouge">vector<bool></code> is optimized to use bitpack so each element only uses 1 bit instead of a whole byte.</p>
<p>However, this means when acessing a specific element, it needs to return a special structure, <code class="language-plaintext highlighter-rouge">std::__bit_reference<std::vector<bool></code>, which can be implicitly converted to <code class="language-plaintext highlighter-rouge">bool</code>:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="n">std</span><span class="o">::</span><span class="n">vector</span><span class="o"><</span><span class="kt">bool</span><span class="o">></span> <span class="n">f</span><span class="p">()</span> <span class="p">{</span>
<span class="n">std</span><span class="o">::</span><span class="n">vector</span><span class="o"><</span><span class="kt">bool</span><span class="o">></span> <span class="n">bv</span> <span class="p">{</span><span class="nb">true</span><span class="p">};</span>
<span class="k">return</span> <span class="n">bv</span><span class="p">;</span>
<span class="p">}</span>
<span class="c1">// Implicit conversion</span>
<span class="kt">bool</span> <span class="n">b</span> <span class="o">=</span> <span class="n">f</span><span class="p">()[</span><span class="mi">0</span><span class="p">];</span>
<span class="n">std</span><span class="o">::</span><span class="n">cout</span> <span class="o"><<</span> <span class="n">b</span> <span class="o"><<</span> <span class="n">std</span><span class="o">::</span><span class="n">endl</span><span class="p">;</span> <span class="c1">// 1</span></code></pre></figure>
<p>However if we use <code class="language-plaintext highlighter-rouge">auto</code>:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="n">std</span><span class="o">::</span><span class="n">vector</span><span class="o"><</span><span class="kt">bool</span><span class="o">></span> <span class="n">f</span><span class="p">()</span> <span class="p">{</span>
<span class="n">std</span><span class="o">::</span><span class="n">vector</span><span class="o"><</span><span class="kt">bool</span><span class="o">></span> <span class="n">bv</span> <span class="p">{</span><span class="nb">true</span><span class="p">};</span>
<span class="k">return</span> <span class="n">bv</span><span class="p">;</span>
<span class="p">}</span>
<span class="c1">// std::__bit_reference<std::vector<bool></span>
<span class="k">auto</span> <span class="n">b</span> <span class="o">=</span> <span class="n">f</span><span class="p">()[</span><span class="mi">0</span><span class="p">];</span>
<span class="n">std</span><span class="o">::</span><span class="n">cout</span> <span class="o"><<</span> <span class="n">b</span> <span class="o"><<</span> <span class="n">std</span><span class="o">::</span><span class="n">endl</span><span class="p">;</span> <span class="c1">// ??</span></code></pre></figure>
<p><code class="language-plaintext highlighter-rouge">b</code> holds a reference to an object that doesn’t exist anymore (i.e. the temporary object created to hold <code class="language-plaintext highlighter-rouge">f()</code>’s return value), so its value is undefined.</p>
<p>More generally, in any case we use proxy classes, i.e. types that are not actually the type one would expect but can be implicitly converted to it, we might have such risk. The author suggests using <code class="language-plaintext highlighter-rouge">static_cast<T></code> to solve this issue:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="k">auto</span> <span class="n">b</span> <span class="o">=</span> <span class="k">static_cast</span><span class="o"><</span><span class="kt">bool</span><span class="o">></span><span class="p">(</span><span class="n">f</span><span class="p">()[</span><span class="mi">0</span><span class="p">]);</span></code></pre></figure>
<p>but then I’m not sure about the advantage of using <code class="language-plaintext highlighter-rouge">auto</code>.</p>
<h2 id="chapter-3---moving-to-modern-c">Chapter 3 - Moving to Modern C++</h2>
<h3 id="item-7---distinguish-between--and--when-creating-objects">Item 7 - Distinguish between () and {} when creating objects</h3>
<p>Variables can be initialized via assignment, parenthesis or curly braces:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="kt">int</span> <span class="n">a</span> <span class="o">=</span> <span class="mi">1</span><span class="p">;</span>
<span class="kt">int</span> <span class="nf">b</span> <span class="p">(</span><span class="mi">2</span><span class="p">);</span>
<span class="kt">int</span> <span class="n">c</span> <span class="p">{</span><span class="mi">3</span><span class="p">};</span></code></pre></figure>
<p>The advantage of the curly braces is that it prevents <em>narrowing conversion</em>, which is when a broader type (e.g. <code class="language-plaintext highlighter-rouge">double</code>) gets converted to a narrower one (e.g. <code class="language-plaintext highlighter-rouge">int</code>), possibly causing information loss:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="kt">double</span> <span class="n">x</span> <span class="o">=</span> <span class="mf">1.1</span><span class="p">;</span>
<span class="kt">int</span> <span class="nf">b</span> <span class="p">(</span><span class="n">x</span><span class="p">);</span> <span class="c1">// truncates to 1</span>
<span class="kt">int</span> <span class="n">c</span> <span class="p">{</span><span class="n">x</span><span class="p">};</span> <span class="c1">// compile error</span></code></pre></figure>
<p>Curly braces won’t compile. Another issue curly braces avoid is the <em>vexing parse</em> in which the initialization syntax is the same as a function declaration. For example:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="k">struct</span> <span class="nc">C</span> <span class="p">{</span>
<span class="n">C</span><span class="p">(</span><span class="kt">int</span> <span class="n">x</span><span class="p">)</span> <span class="p">{}</span>
<span class="n">C</span><span class="p">()</span> <span class="p">{}</span>
<span class="p">};</span>
<span class="n">C</span> <span class="nf">c1</span><span class="p">(</span><span class="mi">1</span><span class="p">);</span> <span class="c1">// creates instance of C</span>
<span class="n">C</span> <span class="nf">c2</span><span class="p">();</span> <span class="c1">// declares a function</span></code></pre></figure>
<p>The last expression might seem like it’s creating an instance of <code class="language-plaintext highlighter-rouge">C</code> by calling the default constructor but it’s actually declaring a function. Using curly braces does the intuitive thing:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="n">C</span> <span class="n">c1</span><span class="p">{</span><span class="mi">1</span><span class="p">};</span> <span class="c1">// creates instance of C</span>
<span class="n">C</span> <span class="n">c2</span><span class="p">{};</span> <span class="c1">// creates instance of C</span></code></pre></figure>
<p>Another scenario in which parenthesis and curly braces behave differently is passing two arguments to a <code class="language-plaintext highlighter-rouge">int</code> vector:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="c1">// vector with 10 elements, all set to 20</span>
<span class="n">std</span><span class="o">::</span><span class="n">vector</span><span class="o"><</span><span class="kt">int</span><span class="o">></span> <span class="n">v1</span> <span class="p">(</span><span class="mi">10</span><span class="p">,</span> <span class="mi">20</span><span class="p">);</span>
<span class="c1">// vector with 2 elements, 10 and 20</span>
<span class="n">std</span><span class="o">::</span><span class="n">vector</span><span class="o"><</span><span class="kt">int</span><span class="o">></span> <span class="n">v2</span> <span class="p">{</span><span class="mi">10</span><span class="p">,</span> <span class="mi">20</span><span class="p">};</span></code></pre></figure>
<h3 id="item-8---prefer-nullptr-to-0-and-null">Item 8 - Prefer nullptr to 0 and NULL</h3>
<p>The item suggests <code class="language-plaintext highlighter-rouge">nullptr</code> is more readable when representing null pointers than either <code class="language-plaintext highlighter-rouge">0</code> and <code class="language-plaintext highlighter-rouge">NULL</code>.</p>
<p>There are also some cases when using templates where passing either <code class="language-plaintext highlighter-rouge">0</code> or <code class="language-plaintext highlighter-rouge">NULL</code> won’t compile as pointers. For example:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="k">struct</span> <span class="nc">C</span> <span class="p">{};</span>
<span class="kt">void</span> <span class="nf">f</span><span class="p">(</span><span class="n">std</span><span class="o">::</span><span class="n">shared_ptr</span><span class="o"><</span><span class="n">C</span><span class="o">></span> <span class="n">p</span><span class="p">)</span> <span class="p">{}</span>
<span class="k">template</span><span class="o"><</span><span class="k">typename</span> <span class="nc">F</span><span class="p">,</span> <span class="k">typename</span> <span class="nc">P</span><span class="p">></span>
<span class="kt">void</span> <span class="nf">apply</span><span class="p">(</span><span class="n">F</span> <span class="n">fun</span><span class="p">,</span> <span class="n">P</span> <span class="n">p</span><span class="p">)</span> <span class="p">{</span>
<span class="n">fun</span><span class="p">(</span><span class="n">p</span><span class="p">);</span>
<span class="p">}</span>
<span class="c1">// ok</span>
<span class="n">apply</span><span class="p">(</span><span class="n">f</span><span class="p">,</span> <span class="nb">nullptr</span><span class="p">);</span>
<span class="c1">// error because P is deduced to be int</span>
<span class="n">apply</span><span class="p">(</span><span class="n">f</span><span class="p">,</span> <span class="mi">0</span><span class="p">);</span></code></pre></figure>
<h3 id="item-9---prefer-alias-declarations-over-typedefs">Item 9 - Prefer alias declarations over typedefs</h3>
<p>It boils down to templates. <code class="language-plaintext highlighter-rouge">typedef</code> cannot be templatized. An example using alias declaration:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="k">template</span><span class="o"><</span><span class="k">typename</span> <span class="nc">T</span><span class="p">></span>
<span class="k">using</span> <span class="n">MyVec</span> <span class="o">=</span> <span class="n">std</span><span class="o">::</span><span class="n">vector</span><span class="o"><</span><span class="n">T</span><span class="o">></span><span class="p">;</span></code></pre></figure>
<p>If we want to achieve the same using <code class="language-plaintext highlighter-rouge">typedefs</code> we need to use a <code class="language-plaintext highlighter-rouge">struct</code>:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="k">template</span><span class="o"><</span><span class="k">typename</span> <span class="nc">T</span><span class="p">></span>
<span class="k">struct</span> <span class="nc">MyVec</span> <span class="p">{</span>
<span class="k">typedef</span> <span class="n">std</span><span class="o">::</span><span class="n">vector</span><span class="o"><</span><span class="n">T</span><span class="o">></span> <span class="n">type</span><span class="p">;</span>
<span class="p">};</span></code></pre></figure>
<p>Whenever we use this new type we need to do <code class="language-plaintext highlighter-rouge">typename MyVec<T>::type</code> as opposed to <code class="language-plaintext highlighter-rouge">MyVec<T></code> for alias declaration.</p>
<h3 id="item-10---prefer-scoped-enums-to-unscoped-enums">Item 10 - Prefer scoped enums to unscoped enums</h3>
<p>The C++98 enums are known as unscoped enums:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="k">enum</span> <span class="n">RGB</span> <span class="p">{</span> <span class="n">red</span><span class="p">,</span> <span class="n">green</span><span class="p">,</span> <span class="n">blue</span> <span class="p">};</span></code></pre></figure>
<p>The C++11 enums are called scoped enums (note the <code class="language-plaintext highlighter-rouge">class</code> modifier):</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="k">enum</span> <span class="k">class</span> <span class="nc">RGB</span> <span class="p">{</span> <span class="n">red</span><span class="p">,</span> <span class="n">green</span><span class="p">,</span> <span class="n">blue</span> <span class="p">};</span></code></pre></figure>
<p>To refer to a scoped enum value we do <code class="language-plaintext highlighter-rouge">RGB::red</code> as opposed to <code class="language-plaintext highlighter-rouge">red</code> previously. The need for qualifying the enum value is the origin of <em>scoped</em>. This prevents scope pollution (e.g. another enum including <code class="language-plaintext highlighter-rouge">red</code> would fail to compile).</p>
<p>One case where unscoped enums work better is to implement named tuple access for readability:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="k">enum</span> <span class="n">Field</span> <span class="p">{</span> <span class="n">name</span><span class="p">,</span> <span class="n">address</span> <span class="p">};</span>
<span class="n">std</span><span class="o">::</span><span class="n">tuple</span><span class="o"><</span><span class="n">std</span><span class="o">::</span><span class="n">string</span><span class="p">,</span> <span class="n">std</span><span class="o">::</span><span class="n">string</span><span class="o">></span> <span class="n">info</span><span class="p">;</span>
<span class="c1">// same as std::get<0>(info)</span>
<span class="n">std</span><span class="o">::</span><span class="n">get</span><span class="o"><</span><span class="n">name</span><span class="o">></span><span class="p">(</span><span class="n">info</span><span class="p">);</span></code></pre></figure>
<p>The alternative using scoped enums would require explicit downcast to <code class="language-plaintext highlighter-rouge">std::size_t</code> because its default type is <code class="language-plaintext highlighter-rouge">int</code>.</p>
<h3 id="item-11---prefer-deleted-functions-to-private-undefined-ones">Item 11 - Prefer deleted functions to private undefined ones</h3>
<p>Suppose we’re inheriting from a class and we want to “hide” some of the methods from the parent class to all callers. One way to achieve this is by making the methods private. However, member methods or friend classes would still be able to call them by accident, so we can also not define them.</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="k">class</span> <span class="nc">Child</span> <span class="o">:</span> <span class="k">public</span> <span class="n">Parent</span> <span class="p">{</span>
<span class="nl">private:</span>
<span class="c1">// not defined</span>
<span class="kt">void</span> <span class="n">hiddenMethod</span><span class="p">();</span>
<span class="p">}</span></code></pre></figure>
<p>The problem is that if being invoked by a different compilation unit, this would only fail at linking time which is harder to understand. We can instead delete the method:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="k">class</span> <span class="nc">Child</span> <span class="o">:</span> <span class="k">public</span> <span class="n">Parent</span> <span class="p">{</span>
<span class="nl">public:</span>
<span class="kt">void</span> <span class="n">hiddenMethod</span><span class="p">()</span> <span class="o">=</span> <span class="k">delete</span><span class="p">;</span>
<span class="p">}</span></code></pre></figure>
<h3 id="item-12---declare-overriding-functions-override">Item 12 - Declare overriding functions override</h3>
<p>To recap, suppose class <code class="language-plaintext highlighter-rouge">Child</code> inherits from <code class="language-plaintext highlighter-rouge">Parent</code>. Overriding functions allows us to call the method from the instance’s <code class="language-plaintext highlighter-rouge">Child</code> type even when the type on the signature is of <code class="language-plaintext highlighter-rouge">Parent</code> type. Example:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="k">struct</span> <span class="nc">Parent</span> <span class="p">{</span>
<span class="k">virtual</span> <span class="kt">void</span> <span class="n">f</span><span class="p">()</span> <span class="k">const</span> <span class="p">{</span>
<span class="n">std</span><span class="o">::</span><span class="n">cout</span> <span class="o"><<</span> <span class="s">"parent"</span> <span class="o"><<</span> <span class="n">std</span><span class="o">::</span><span class="n">endl</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">};</span>
<span class="k">struct</span> <span class="nc">Child</span> <span class="o">:</span> <span class="k">public</span> <span class="n">Parent</span> <span class="p">{</span>
<span class="kt">void</span> <span class="n">f</span><span class="p">()</span> <span class="k">const</span> <span class="p">{</span>
<span class="n">std</span><span class="o">::</span><span class="n">cout</span> <span class="o"><<</span> <span class="s">"child"</span> <span class="o"><<</span> <span class="n">std</span><span class="o">::</span><span class="n">endl</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">};</span>
<span class="kt">void</span> <span class="nf">g</span><span class="p">(</span><span class="n">Parent</span> <span class="o">&</span><span class="n">x</span><span class="p">)</span> <span class="p">{</span>
<span class="n">x</span><span class="p">.</span><span class="n">f</span><span class="p">();</span>
<span class="p">}</span>
<span class="n">Child</span> <span class="n">c</span><span class="p">;</span>
<span class="n">g</span><span class="p">(</span><span class="n">c</span><span class="p">);</span> <span class="c1">// prints "child"</span></code></pre></figure>
<p>It’s easy to get this wrong. If we forget to add <code class="language-plaintext highlighter-rouge">virtual</code> to the parent method or make a mistake when defining the signature of the child the override won’t take place. Example:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="k">struct</span> <span class="nc">Child</span> <span class="o">:</span> <span class="k">public</span> <span class="n">Parent</span> <span class="p">{</span>
<span class="kt">void</span> <span class="n">f</span><span class="p">()</span> <span class="p">{</span>
<span class="n">std</span><span class="o">::</span><span class="n">cout</span> <span class="o"><<</span> <span class="s">"child"</span> <span class="o"><<</span> <span class="n">std</span><span class="o">::</span><span class="n">endl</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">};</span></code></pre></figure>
<p>Here we forgot to add <code class="language-plaintext highlighter-rouge">const</code> to the <code class="language-plaintext highlighter-rouge">Child::f()</code> so it’s not overriding. The <code class="language-plaintext highlighter-rouge">override</code> keyword will cause a compilation error if that happens.</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="k">struct</span> <span class="nc">Child</span> <span class="o">:</span> <span class="k">public</span> <span class="n">Parent</span> <span class="p">{</span>
<span class="kt">void</span> <span class="n">f</span><span class="p">()</span> <span class="k">override</span> <span class="p">{</span>
<span class="n">std</span><span class="o">::</span><span class="n">cout</span> <span class="o"><<</span> <span class="s">"child"</span> <span class="o"><<</span> <span class="n">std</span><span class="o">::</span><span class="n">endl</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">};</span></code></pre></figure>
<p><code class="language-plaintext highlighter-rouge">clang</code> reports this error:</p>
<blockquote>
<p>hidden overloaded virtual function ‘C::f’ declared here: different qualifiers (‘const’ vs unqualified)</p>
</blockquote>
<h3 id="item-13---prefer-const-iterators">Item 13 - Prefer const iterators</h3>
<p>Use <code class="language-plaintext highlighter-rouge">cbegin()</code> and <code class="language-plaintext highlighter-rouge">cend()</code> from <code class="language-plaintext highlighter-rouge">stl</code> collections whenever possible as opposed to <code class="language-plaintext highlighter-rouge">begin()</code> and <code class="language-plaintext highlighter-rouge">end()</code>.</p>
<h3 id="item-14---declare-functions-noexcept-when-possible">Item 14 - Declare functions noexcept when possible</h3>
<p>We can annotate a function with <code class="language-plaintext highlighter-rouge">noexcept</code> to indicate it doesn’t throw exceptions:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="kt">void</span> <span class="n">f</span><span class="p">()</span> <span class="k">noexcept</span> <span class="p">{</span>
<span class="p">...</span>
<span class="p">}</span></code></pre></figure>
<p><code class="language-plaintext highlighter-rouge">noexcept</code> functions can be better optimized by the compilers. However, there’s no compile time constraint to enforce a <code class="language-plaintext highlighter-rouge">noexcept</code> doesn’t really throw exceptions or call functions that do.</p>
<p>My take on this item is that it’s not very broadly applicable.</p>
<h3 id="item-15---use-constexpr-wherever-possible">Item 15 - Use constexpr wherever possible</h3>
<p>The <code class="language-plaintext highlighter-rouge">constexpr</code> can be used when declaring variables or functions. For variables, its value is resolved at compile time:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="k">constexpr</span> <span class="k">auto</span> <span class="n">SIZE</span> <span class="o">=</span> <span class="mi">2</span> <span class="o">+</span> <span class="mi">3</span><span class="p">;</span></code></pre></figure>
<p><code class="language-plaintext highlighter-rouge">constexpr</code> variables can be a function of other <code class="language-plaintext highlighter-rouge">constexpr</code> variables:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="k">constexpr</span> <span class="k">auto</span> <span class="n">ONE</span> <span class="o">=</span> <span class="mi">1</span><span class="p">;</span>
<span class="k">constexpr</span> <span class="k">auto</span> <span class="n">TWO</span> <span class="o">=</span> <span class="mi">2</span><span class="p">;</span>
<span class="k">constexpr</span> <span class="k">auto</span> <span class="n">SIZE</span> <span class="o">=</span> <span class="n">ONE</span> <span class="o">+</span> <span class="n">TWO</span><span class="p">;</span></code></pre></figure>
<p>For functions, its behavior depends on whether <em>all</em> the arguments are <code class="language-plaintext highlighter-rouge">constexpr</code>. If yes, then its result is also a <code class="language-plaintext highlighter-rouge">constexpr</code>, else it’s a regular function. The body of a <code class="language-plaintext highlighter-rouge">constexpr</code> function can only depend on other <code class="language-plaintext highlighter-rouge">constexpr</code> functions.</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="k">constexpr</span> <span class="kt">int</span> <span class="nf">add</span><span class="p">(</span><span class="kt">int</span> <span class="n">x</span><span class="p">,</span> <span class="kt">int</span> <span class="n">y</span><span class="p">)</span> <span class="p">{</span>
<span class="k">return</span> <span class="n">x</span> <span class="o">+</span> <span class="n">y</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">constexpr</span> <span class="kt">int</span> <span class="nf">inc</span><span class="p">(</span><span class="kt">int</span> <span class="n">x</span><span class="p">)</span> <span class="p">{</span>
<span class="k">return</span> <span class="n">add</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="mi">1</span><span class="p">);</span>
<span class="p">}</span>
<span class="c1">// ok</span>
<span class="k">const</span> <span class="k">auto</span> <span class="n">ONE</span> <span class="o">=</span> <span class="mi">1</span><span class="p">;</span>
<span class="k">constexpr</span> <span class="k">auto</span> <span class="n">SIZE</span> <span class="o">=</span> <span class="n">inc</span><span class="p">(</span><span class="n">ONE</span><span class="p">);</span>
<span class="c1">// ok too</span>
<span class="kt">int</span> <span class="n">x</span> <span class="o">=</span> <span class="mi">1</span><span class="p">;</span>
<span class="kt">int</span> <span class="n">y</span> <span class="o">=</span> <span class="n">inc</span><span class="p">(</span><span class="n">x</span><span class="p">);</span></code></pre></figure>
<p>Downside: it’s hard to debug or profile <code class="language-plaintext highlighter-rouge">constexpr</code> functions because <code class="language-plaintext highlighter-rouge">printf()</code> is considered side-effect.</p>
<h3 id="item-16---make-const-members-thread-safe">Item 16 - Make const members thread safe</h3>
<p>The gist of this item is that there’s a backdoor to mutate variables in <code class="language-plaintext highlighter-rouge">const</code> methods: declaring them as <code class="language-plaintext highlighter-rouge">mutable</code>, for example:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="k">class</span> <span class="nc">C</span> <span class="p">{</span>
<span class="kt">void</span> <span class="n">f</span><span class="p">()</span> <span class="k">const</span> <span class="p">{</span>
<span class="n">x</span><span class="o">++</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">mutable</span> <span class="kt">int</span> <span class="n">x</span> <span class="o">=</span> <span class="mi">10</span><span class="p">;</span>
<span class="p">};</span></code></pre></figure>
<p>And hence not thread-safe.</p>
<h3 id="item-17---understand-special-member-function-generation">Item 17 - Understand special member function generation</h3>
<p>This item discusses the conditions in which the default constructor, destructor and assignment operators are auto-generated.</p>
<p>Let’s abbreviate:</p>
<ul>
<li>CC: Copy constructor</li>
<li>CA: Copy assignment</li>
<li>MC: Move constructor</li>
<li>MA: Move assignment</li>
<li>D: Destructor</li>
</ul>
<p>We can build a table to encode the rules for when a member function is auto-generated. To read the table: a member function corresponding to a row is only auto-generated if <em>none</em> of the columns in which an ✓ exists is user defined.</p>
<table>
<thead>
<tr>
<th> </th>
<th>CC</th>
<th>CA</th>
<th>MC</th>
<th>MA</th>
<th>D</th>
</tr>
</thead>
<tbody>
<tr>
<td>CC</td>
<td>✓</td>
<td> </td>
<td>✓</td>
<td>✓</td>
<td> </td>
</tr>
<tr>
<td>CA</td>
<td> </td>
<td>✓</td>
<td>✓</td>
<td>✓</td>
<td> </td>
</tr>
<tr>
<td>MC</td>
<td>✓</td>
<td>✓</td>
<td>✓</td>
<td>✓</td>
<td>✓</td>
</tr>
<tr>
<td>MA</td>
<td>✓</td>
<td>✓</td>
<td>✓</td>
<td>✓</td>
<td>✓</td>
</tr>
</tbody>
</table>
<p>It also mentions the <em>Rule of three</em>:</p>
<blockquote>
<p>Rule of Three: if you declare any of copy constructor, copy assignment or destructor, you should declare all three.</p>
</blockquote>
<p>Templated operations do not count towards special member functions. For example:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="k">class</span> <span class="nc">C</span> <span class="p">{</span>
<span class="c1">// Not a copy constructor even when T=C</span>
<span class="k">template</span><span class="o"><</span><span class="k">typename</span> <span class="nc">T</span><span class="p">></span>
<span class="n">C</span><span class="p">(</span><span class="k">const</span> <span class="n">T</span><span class="o">&</span> <span class="n">c</span><span class="p">);</span>
<span class="c1">// Not a move assignment even when T=C</span>
<span class="k">template</span><span class="o"><</span><span class="k">typename</span> <span class="nc">T</span><span class="p">></span>
<span class="n">C</span><span class="o">&</span> <span class="k">operator</span><span class="o">=</span><span class="p">(</span><span class="k">const</span> <span class="n">T</span><span class="o">&&</span> <span class="n">c</span><span class="p">);</span>
<span class="p">};</span></code></pre></figure>
<h2 id="chapter-4-smart-pointers">Chapter 4: Smart Pointers</h2>
<h3 id="item-18---use-stdunique_ptr-for-exclusive-ownership-resource-management">Item 18 - Use std::unique_ptr for exclusive-ownership resource management</h3>
<p>We’ve discussed unique pointers in <a href="https://www.kuniga.me/blog/2022/06/10/smart-pointers-cpp.html">Smart Pointers in C++</a>. This section describes other things I’ve learned from the book.</p>
<p><code class="language-plaintext highlighter-rouge">std::unique_ptr</code> supports custom deleters which are made part of the type:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="k">auto</span> <span class="n">deleter</span> <span class="o">=</span> <span class="p">[](</span><span class="n">C</span><span class="o">*</span> <span class="n">ptr</span><span class="p">)</span> <span class="p">{</span>
<span class="k">delete</span> <span class="n">ptr</span><span class="p">;</span>
<span class="p">};</span>
<span class="n">std</span><span class="o">::</span><span class="n">unique_ptr</span><span class="o"><</span><span class="n">C</span><span class="p">,</span> <span class="k">decltype</span><span class="p">(</span><span class="n">deleter</span><span class="p">)</span><span class="o">></span> <span class="n">uPtr</span><span class="p">(</span><span class="nb">nullptr</span><span class="p">,</span> <span class="n">deleter</span><span class="p">);</span></code></pre></figure>
<p>The size of <code class="language-plaintext highlighter-rouge">std::unique_ptr</code> is the same as raw pointers unless custom deleters are used.</p>
<h3 id="item-19---use-stdshared_ptr-for-shared-ownership-resource-management">Item 19 - Use std::shared_ptr for shared-ownership resource management</h3>
<p>We’ve discussed shared pointers in <a href="https://www.kuniga.me/blog/2022/06/10/smart-pointers-cpp.html">Smart Pointers in C++</a>. This section describes other things I’ve learned from the book.</p>
<p>Moving shared pointers (as opposed to copying) avoids reference count changes (which can be expensive since it’s atomic).</p>
<p>Shared pointers allocate memory for a control block which among other things stores the reference count, so the size of <code class="language-plaintext highlighter-rouge">std::shared_ptr</code> is at least twice as big than <code class="language-plaintext highlighter-rouge">std::unique_ptr</code>.</p>
<p>Differently from <code class="language-plaintext highlighter-rouge">std::unique_ptr</code>, the custom deleter is not part of the type of a <code class="language-plaintext highlighter-rouge">std::shared_ptr</code> because it can be stored in the control block.</p>
<h3 id="item-20---use-stdweak_ptr-for-stdshared_ptr-like-pointer-that-can-dangle">Item 20 - Use std::weak_ptr for std::shared_ptr like pointer that can dangle</h3>
<p>A <code class="language-plaintext highlighter-rouge">std::weak_ptr</code> can be obtained from <code class="language-plaintext highlighter-rouge">std::shared_ptr</code> but does not increase reference count. This is useful to prevent cyclical dependencies in which case reference count doesn’t work but this is very uncommon.</p>
<p><code class="language-plaintext highlighter-rouge">std::weak_ptr</code> also have a control block like <code class="language-plaintext highlighter-rouge">std::shared_ptr</code> but it has a different reference count.</p>
<h3 id="item-21---prefer-stdmake_unique-and-stdmake_shared-to-direct-use-of-new">Item 21 - Prefer std::make_unique and std::make_shared to direct use of new</h3>
<p>We’ve discussed the merits of <code class="language-plaintext highlighter-rouge">std::make_unique</code> and <code class="language-plaintext highlighter-rouge">std::make_shared</code> in <a href="https://www.kuniga.me/blog/2022/06/10/smart-pointers-cpp.html">Smart Pointers in C++</a>. This section describes other things I’ve learned from the book.</p>
<p>One interesting bit is that when using <code class="language-plaintext highlighter-rouge">std::make_shared</code>, it allocates the object being created and the control block in the same chunk of memory.</p>
<h3 id="item-22---when-using-the-pimpl-idiom-define-special-member-functions-in-the-implementation-file">Item 22 - When using the Pimpl idiom, define special member functions in the implementation file</h3>
<p>The Pimpl idiom is a technique used to reduce build times. The idea is to move heavy dependencies from the <code class="language-plaintext highlighter-rouge">.h</code> file to the <code class="language-plaintext highlighter-rouge">.cpp</code> one. For example, suppose we have some dependency <code class="language-plaintext highlighter-rouge">a.h</code>:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="c1">// a.h</span>
<span class="k">struct</span> <span class="nc">A</span> <span class="p">{</span>
<span class="kt">int</span> <span class="n">get</span><span class="p">()</span> <span class="p">{</span>
<span class="k">return</span> <span class="mi">1</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">};</span></code></pre></figure>
<p>Our main class <code class="language-plaintext highlighter-rouge">B</code> depends on <code class="language-plaintext highlighter-rouge">A</code>, so its header includes it:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="c1">// b.h</span>
<span class="cp">#include "a.h"
</span><span class="k">struct</span> <span class="nc">B</span> <span class="p">{</span>
<span class="n">A</span> <span class="n">a</span><span class="p">;</span>
<span class="kt">int</span> <span class="n">get</span><span class="p">();</span>
<span class="p">};</span></code></pre></figure>
<p>And here’s the implemention:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="c1">// b.cpp</span>
<span class="cp">#include "b.h"
</span><span class="kt">int</span> <span class="n">B</span><span class="o">::</span><span class="n">get</span><span class="p">()</span> <span class="p">{</span>
<span class="k">return</span> <span class="n">a</span><span class="p">.</span><span class="n">get</span><span class="p">();</span>
<span class="p">}</span></code></pre></figure>
<p>If we want to not depend on header <code class="language-plaintext highlighter-rouge">a.h</code> in <code class="language-plaintext highlighter-rouge">b.h</code>, a technique is to define a <code class="language-plaintext highlighter-rouge">struct Impl</code> which depends on <code class="language-plaintext highlighter-rouge">A</code> but we only forward declare in <code class="language-plaintext highlighter-rouge">b.h</code> and create a single member variable as a unique pointer to it (hence the Pimpl name: pointer + implementation):</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="c1">// b.h</span>
<span class="cp">#include <memory>
</span><span class="k">struct</span> <span class="nc">B</span> <span class="p">{</span>
<span class="k">struct</span> <span class="nc">Impl</span><span class="p">;</span>
<span class="n">std</span><span class="o">::</span><span class="n">unique_ptr</span><span class="o"><</span><span class="n">Impl</span><span class="o">></span> <span class="n">impl</span><span class="p">;</span>
<span class="n">B</span><span class="p">();</span>
<span class="kt">int</span> <span class="n">get</span><span class="p">();</span>
<span class="p">};</span></code></pre></figure>
<p>Then in the <code class="language-plaintext highlighter-rouge">b.cpp</code> we actually define the struct <code class="language-plaintext highlighter-rouge">Impl</code> and have the dependency on <code class="language-plaintext highlighter-rouge">a.h</code> there:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="c1">// b.cpp</span>
<span class="k">struct</span> <span class="nc">B</span><span class="o">::</span><span class="n">Impl</span> <span class="p">{</span> <span class="n">A</span> <span class="n">a</span><span class="p">;</span> <span class="p">};</span>
<span class="n">B</span><span class="o">::</span><span class="n">B</span><span class="p">()</span> <span class="o">:</span> <span class="n">impl</span><span class="p">(</span><span class="n">std</span><span class="o">::</span><span class="n">make_unique</span><span class="o"><</span><span class="n">Impl</span><span class="o">></span><span class="p">())</span> <span class="p">{}</span>
<span class="kt">int</span> <span class="n">B</span><span class="o">::</span><span class="n">get</span><span class="p">()</span> <span class="p">{</span>
<span class="k">return</span> <span class="n">impl</span><span class="o">-></span><span class="n">a</span><span class="p">.</span><span class="n">get</span><span class="p">();</span>
<span class="p">}</span></code></pre></figure>
<p>Note that we don’t need the destructor because <code class="language-plaintext highlighter-rouge">impl</code> calls <code class="language-plaintext highlighter-rouge">delete</code> when it falls out of scope. There’s some issue with the auto-generated destructor that the book delves into but I didn’t get errors for that.</p>
<h2 id="chapter-5-rvalue-references-move-semantics-and-perfect-forwarding">Chapter 5: RValue References, Move Semantics and Perfect Forwarding</h2>
<h3 id="item-23---understand-stdmove-and-stdforward">Item 23 - Understand std::move and std::forward</h3>
<p>We’ve discussed <code class="language-plaintext highlighter-rouge">std::move</code> in <a href="https://www.kuniga.me/blog/2022/03/01/moving-semantics-cpp.html">Move Semantics in C++</a>. This section describes other things I’ve learned from the book.</p>
<p>One important observation is that function arguments are always lvalue even if their type is a rvalue reference. For example:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="k">struct</span> <span class="nc">C</span> <span class="p">{};</span>
<span class="kt">void</span> <span class="nf">log</span><span class="p">(</span><span class="n">std</span><span class="o">::</span><span class="n">string</span> <span class="n">s</span><span class="p">)</span> <span class="p">{</span>
<span class="n">std</span><span class="o">::</span><span class="n">cout</span> <span class="o"><<</span> <span class="n">s</span> <span class="o"><<</span> <span class="n">std</span><span class="o">::</span><span class="n">endl</span><span class="p">;</span>
<span class="p">}</span>
<span class="kt">void</span> <span class="nf">g</span><span class="p">(</span><span class="n">C</span> <span class="o">&</span><span class="n">c</span><span class="p">)</span> <span class="p">{</span> <span class="n">log</span><span class="p">(</span><span class="s">"g: lvalue ref"</span><span class="p">);</span> <span class="p">}</span>
<span class="kt">void</span> <span class="nf">g</span><span class="p">(</span><span class="n">C</span> <span class="o">&&</span><span class="n">c</span><span class="p">)</span> <span class="p">{</span> <span class="n">log</span><span class="p">(</span><span class="s">"g: rvalue ref"</span><span class="p">);</span> <span class="p">}</span>
<span class="kt">void</span> <span class="nf">f</span><span class="p">(</span><span class="n">C</span> <span class="o">&</span><span class="n">c</span><span class="p">)</span> <span class="p">{</span>
<span class="n">log</span><span class="p">(</span><span class="s">"g: lvalue ref"</span><span class="p">);</span>
<span class="n">g</span><span class="p">(</span><span class="n">c</span><span class="p">);</span>
<span class="p">}</span>
<span class="kt">void</span> <span class="nf">f</span><span class="p">(</span><span class="n">C</span> <span class="o">&&</span><span class="n">c</span><span class="p">)</span> <span class="p">{</span>
<span class="n">log</span><span class="p">(</span><span class="s">"f: lvalue ref"</span><span class="p">);</span>
<span class="n">g</span><span class="p">(</span><span class="n">c</span><span class="p">);</span>
<span class="p">}</span>
<span class="c1">// f: lvalue ref, g: lvalue ref</span>
<span class="n">f</span><span class="p">(</span><span class="n">c</span><span class="p">);</span>
<span class="c1">// f: rvalue ref, g: lvalue ref</span>
<span class="n">f</span><span class="p">(</span><span class="n">std</span><span class="o">::</span><span class="n">move</span><span class="p">(</span><span class="n">c</span><span class="p">));</span></code></pre></figure>
<p>In [2], we’ve seen that <code class="language-plaintext highlighter-rouge">std::move()</code> is a static cast that converts any reference to a rvalue reference. <code class="language-plaintext highlighter-rouge">std::forward<T>()</code> converts to a rvalue reference <em>conditionally</em>, only if the type <code class="language-plaintext highlighter-rouge">T</code> is itself a rvalue reference.</p>
<p>This is clearer from an example:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="kt">void</span> <span class="nf">g</span><span class="p">(</span><span class="n">std</span><span class="o">::</span><span class="n">string</span> <span class="o">&</span><span class="n">s</span><span class="p">)</span> <span class="p">{</span>
<span class="n">std</span><span class="o">::</span><span class="n">cout</span> <span class="o"><<</span> <span class="s">"lvalue"</span> <span class="o"><<</span> <span class="n">std</span><span class="o">::</span><span class="n">endl</span><span class="p">;</span>
<span class="p">}</span>
<span class="kt">void</span> <span class="nf">g</span><span class="p">(</span><span class="n">std</span><span class="o">::</span><span class="n">string</span> <span class="o">&&</span><span class="n">s</span><span class="p">)</span> <span class="p">{</span>
<span class="n">std</span><span class="o">::</span><span class="n">cout</span> <span class="o"><<</span> <span class="s">"rvalue"</span> <span class="o"><<</span> <span class="n">std</span><span class="o">::</span><span class="n">endl</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">template</span><span class="o"><</span><span class="k">typename</span> <span class="nc">T</span><span class="p">></span>
<span class="kt">void</span> <span class="nf">f</span><span class="p">(</span><span class="n">T</span><span class="o">&&</span> <span class="n">p</span><span class="p">)</span> <span class="p">{</span>
<span class="n">g</span><span class="p">(</span><span class="n">std</span><span class="o">::</span><span class="n">forward</span><span class="o"><</span><span class="n">T</span><span class="o">></span><span class="p">(</span><span class="n">p</span><span class="p">));</span>
<span class="p">}</span>
<span class="n">std</span><span class="o">::</span><span class="n">string</span> <span class="n">s</span> <span class="o">=</span> <span class="s">"a"</span><span class="p">;</span>
<span class="n">f</span><span class="p">(</span><span class="n">s</span><span class="p">);</span> <span class="c1">// lvalue</span>
<span class="n">f</span><span class="p">(</span><span class="n">std</span><span class="o">::</span><span class="n">move</span><span class="p">(</span><span class="n">s</span><span class="p">));</span> <span class="c1">// rvalue</span></code></pre></figure>
<p>Since <code class="language-plaintext highlighter-rouge">T&&</code> is a universal reference, the way it is resolved depends on whether the passed value is a rvalue or lvalue reference. In <code class="language-plaintext highlighter-rouge">f(s)</code>, the type <code class="language-plaintext highlighter-rouge">T&&</code> in <code class="language-plaintext highlighter-rouge">f()</code> resolves to <code class="language-plaintext highlighter-rouge">std::string &</code>. In <code class="language-plaintext highlighter-rouge">f(std::move(s))</code>, <code class="language-plaintext highlighter-rouge">T&&</code> resolves to <code class="language-plaintext highlighter-rouge">std::string &&</code>.</p>
<p>At <code class="language-plaintext highlighter-rouge">f()</code>, <code class="language-plaintext highlighter-rouge">p</code> is a lvalue, so if passed to <code class="language-plaintext highlighter-rouge">g()</code> as is, we’d always call <code class="language-plaintext highlighter-rouge">g(std::string &s)</code> regardless of whether <code class="language-plaintext highlighter-rouge">f()</code> was initially called with a rvalue. We’d like to preserve the rvalue information as if <code class="language-plaintext highlighter-rouge">g()</code> was being called directly. This is what <code class="language-plaintext highlighter-rouge">std::forward<T></code> does.</p>
<p>Simplistically <code class="language-plaintext highlighter-rouge">std::forward<T></code> is basically a <code class="language-plaintext highlighter-rouge">static_cast<T&&></code> but there are <a href="https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2009/n2951.html">nuances</a> we won’t discuss here.</p>
<p>Worth noting that <code class="language-plaintext highlighter-rouge">std::move</code> and <code class="language-plaintext highlighter-rouge">std::forward</code> are both static casts.</p>
<h3 id="item-24---distinguish-universal-references-from-rvalue-references">Item 24 - Distinguish universal references from rvalue references</h3>
<p>Universeal references are rvalue references where type deduction happens, either via <code class="language-plaintext highlighter-rouge">auto</code> or templates. Examples:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="k">template</span><span class="o"><</span><span class="k">typename</span> <span class="nc">T</span><span class="p">></span>
<span class="kt">void</span> <span class="nf">f</span><span class="p">(</span><span class="n">T</span><span class="o">&&</span> <span class="n">p</span><span class="p">)</span> <span class="p">{}</span>
<span class="k">auto</span><span class="o">&&</span> <span class="n">x</span> <span class="o">=</span> <span class="s">"hello"</span><span class="p">;</span></code></pre></figure>
<p>It’s called a universal reference because it properly handles both rvalue and lvalue references.</p>
<h3 id="item-25---use-stdmove-on-rvalue-references-stdforward-on-universal-references">Item 25 - Use std::move on rvalue references, std::forward on universal references</h3>
<p>This item basically says that if we got <code class="language-plaintext highlighter-rouge">p</code> as a universal reference we should pass it along using <code class="language-plaintext highlighter-rouge">std::forward<T></code>:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="k">template</span><span class="o"><</span><span class="k">typename</span> <span class="nc">T</span><span class="p">></span>
<span class="kt">void</span> <span class="nf">f</span><span class="p">(</span><span class="n">T</span><span class="o">&&</span> <span class="n">p</span><span class="p">)</span> <span class="p">{</span>
<span class="n">g</span><span class="p">(</span><span class="n">std</span><span class="o">::</span><span class="n">forward</span><span class="o"><</span><span class="n">T</span><span class="o">></span><span class="p">(</span><span class="n">p</span><span class="p">));</span>
<span class="p">}</span></code></pre></figure>
<p>But if we got it as a rvalue reference we should use <code class="language-plaintext highlighter-rouge">std::move</code>:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="kt">void</span> <span class="nf">f</span><span class="p">(</span><span class="n">C</span><span class="o">&&</span> <span class="n">p</span><span class="p">)</span> <span class="p">{</span>
<span class="n">g</span><span class="p">(</span><span class="n">std</span><span class="o">::</span><span class="n">move</span><span class="p">(</span><span class="n">p</span><span class="p">));</span>
<span class="p">}</span></code></pre></figure>
<h3 id="item-26---avoid-overloading-on-universal-references">Item 26 - Avoid overloading on universal references</h3>
<p>This item basically says that if we have a single parameter function:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="kt">void</span> <span class="nf">f</span><span class="p">(</span><span class="kt">int</span> <span class="n">x</span><span class="p">)</span> <span class="p">{</span> <span class="p">}</span></code></pre></figure>
<p>Do not add an overload using universal references like:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="k">template</span><span class="o"><</span><span class="k">typename</span> <span class="nc">T</span><span class="p">></span>
<span class="kt">void</span> <span class="nf">f</span><span class="p">(</span><span class="n">T</span><span class="o">&&</span> <span class="n">x</span><span class="p">)</span> <span class="p">{</span> <span class="p">}</span></code></pre></figure>
<p>This might make the overload resolution difficult to reason about. For example, if we pass <code class="language-plaintext highlighter-rouge">short</code> to <code class="language-plaintext highlighter-rouge">f()</code> it actually calls the universal reference overload.</p>
<p>This is even worse if we use universal references in the constructor, because it mixes up with copy and move constructors.</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="k">class</span> <span class="nc">C</span> <span class="p">{</span>
<span class="k">template</span><span class="o"><</span><span class="k">typename</span> <span class="nc">T</span><span class="p">></span>
<span class="k">explicit</span> <span class="n">C</span><span class="p">(</span><span class="n">T</span><span class="o">&&</span> <span class="n">x</span><span class="p">)</span> <span class="p">{}</span>
<span class="n">C</span><span class="p">(</span><span class="k">const</span> <span class="n">C</span><span class="o">&</span> <span class="n">x</span><span class="p">)</span> <span class="p">{}</span>
<span class="p">}</span>
<span class="n">C</span> <span class="n">c1</span><span class="p">;</span>
<span class="n">C</span> <span class="nf">c2</span><span class="p">(</span><span class="n">c1</span><span class="p">);</span> <span class="c1">// which constructor does it call?</span></code></pre></figure>
<p>The book delves into the details on why the last line calls the universal reference constructor.</p>
<h3 id="item-27---alternatives-to-overloading-universal-references">Item 27 - Alternatives to overloading universal references</h3>
<p>This item provides several ways to avoid the universal references overloading. One of the most interesting is <em>tag dispatch</em>. It basically leverages static checks to make sure the right overload is used. So if we have:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="kt">void</span> <span class="nf">f</span><span class="p">(</span><span class="kt">int</span> <span class="n">x</span><span class="p">)</span> <span class="p">{</span> <span class="p">}</span>
<span class="k">template</span><span class="o"><</span><span class="k">typename</span> <span class="nc">T</span><span class="p">></span>
<span class="kt">void</span> <span class="nf">f</span><span class="p">(</span><span class="n">T</span><span class="o">&&</span> <span class="n">x</span><span class="p">)</span> <span class="p">{</span> <span class="p">}</span></code></pre></figure>
<p>We can turn <code class="language-plaintext highlighter-rouge">f</code> into a dispatcher function and have the int and non-int logic as <code class="language-plaintext highlighter-rouge">fImpl</code>:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="k">template</span><span class="o"><</span><span class="k">typename</span> <span class="nc">T</span><span class="p">></span>
<span class="kt">void</span> <span class="nf">f</span><span class="p">(</span><span class="n">T</span><span class="o">&&</span> <span class="n">x</span><span class="p">)</span> <span class="p">{</span>
<span class="n">fImpl</span><span class="p">(</span>
<span class="n">std</span><span class="o">::</span><span class="n">forward</span><span class="o"><</span><span class="n">T</span><span class="o">></span><span class="p">(</span><span class="n">x</span><span class="p">),</span>
<span class="n">std</span><span class="o">::</span><span class="n">is_integral</span><span class="o"><</span><span class="n">std</span><span class="o">::</span><span class="n">remove_ref</span><span class="o"><</span><span class="n">T</span><span class="o">>></span>
<span class="p">);</span>
<span class="p">}</span>
<span class="kt">void</span> <span class="nf">fImpl</span><span class="p">(</span><span class="kt">int</span> <span class="n">x</span><span class="p">,</span> <span class="n">std</span><span class="o">::</span><span class="n">true_type</span><span class="p">)</span> <span class="p">{</span> <span class="p">}</span>
<span class="k">template</span><span class="o"><</span><span class="k">typename</span> <span class="nc">T</span><span class="p">></span>
<span class="kt">void</span> <span class="nf">fImpl</span><span class="p">(</span><span class="n">T</span><span class="o">&&</span> <span class="n">x</span><span class="p">,</span> <span class="n">std</span><span class="o">::</span><span class="n">false_type</span><span class="p">)</span> <span class="p">{</span> <span class="p">}</span></code></pre></figure>
<p>When we call <code class="language-plaintext highlighter-rouge">f()</code> with an integer type, <code class="language-plaintext highlighter-rouge">std::is_integral<std::remove_ref<T>></code> will resolve to <code class="language-plaintext highlighter-rouge">std::true_type</code> and call <code class="language-plaintext highlighter-rouge">fImpl(int x, std::true_type)</code>. Otherwise it resolves to <code class="language-plaintext highlighter-rouge">std::false_type</code> and calls <code class="language-plaintext highlighter-rouge">fImpl(T&& x, std::false_type)</code>.</p>
<h3 id="item-28---understand-reference-collapsing">Item 28 - Understand reference collapsing</h3>
<p>You can’t directly write a reference to a reference:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="kt">int</span> <span class="n">x</span> <span class="o">=</span> <span class="mi">10</span><span class="p">;</span>
<span class="kt">int</span><span class="o">&</span> <span class="o">&</span> <span class="n">a</span> <span class="o">=</span> <span class="n">x</span><span class="p">;</span></code></pre></figure>
<p>But compilers might as intermediate steps when deducing types, for example:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="kt">int</span> <span class="n">x</span> <span class="o">=</span> <span class="mi">10</span><span class="p">;</span>
<span class="kt">int</span> <span class="o">&</span><span class="n">y</span> <span class="o">=</span> <span class="n">x</span><span class="p">;</span>
<span class="k">auto</span><span class="o">&</span> <span class="n">a</span> <span class="o">=</span> <span class="n">y</span><span class="p">;</span></code></pre></figure>
<p>Since <code class="language-plaintext highlighter-rouge">y</code> is <code class="language-plaintext highlighter-rouge">int &</code>, <code class="language-plaintext highlighter-rouge">auto&</code> resolves to <code class="language-plaintext highlighter-rouge">int& &</code>, but gets collapsed to <code class="language-plaintext highlighter-rouge">int&</code> in the end.</p>
<p>The rule for collpasing is simple: if both references are rvalue, the result is an rvalue reference, otherwise it’s a lvalue reference. This explains the behavior of universal references:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="kt">int</span> <span class="n">x</span> <span class="o">=</span> <span class="mi">10</span><span class="p">;</span>
<span class="kt">int</span> <span class="o">&</span><span class="n">y</span> <span class="o">=</span> <span class="n">x</span><span class="p">;</span>
<span class="k">auto</span><span class="o">&&</span> <span class="n">a</span> <span class="o">=</span> <span class="n">y</span><span class="p">;</span>
<span class="k">auto</span><span class="o">&&</span> <span class="n">b</span> <span class="o">=</span> <span class="mi">10</span><span class="p">;</span></code></pre></figure>
<p>For <code class="language-plaintext highlighter-rouge">a</code>, <code class="language-plaintext highlighter-rouge">auto &&</code> resolves to <code class="language-plaintext highlighter-rouge">int& &&</code> and thus <code class="language-plaintext highlighter-rouge">int &</code>. For <code class="language-plaintext highlighter-rouge">b</code>, <code class="language-plaintext highlighter-rouge">auto &&</code> resolves to <code class="language-plaintext highlighter-rouge">int&& &&</code> and thus <code class="language-plaintext highlighter-rouge">int &&</code>.</p>
<h3 id="item-29---assume-move-operations-are-not-present-not-cheap-and-not-used">Item 29 - Assume move operations are not present, not cheap, and not used</h3>
<p>One case where move is not cheaper than copying is for when small string optimization (SSO) is used. In this case the content is stored along side the <code class="language-plaintext highlighter-rouge">std::string</code> and not dynamically allocated, so we can’t simply do a pointer swap.</p>
<p>For cases where it’s not used, some STL code only makes use of move operations if they’re are <code class="language-plaintext highlighter-rouge">noexcept</code>, for some back-compatibility reasons.</p>
<h3 id="item-30---familiarize-yourself-with-perfect-forwarding-failures">Item 30 - Familiarize yourself with perfect forwarding failures</h3>
<p>This item discusses cases in which using:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="k">template</span><span class="o"><</span><span class="k">typename</span> <span class="nc">T</span><span class="p">></span>
<span class="kt">void</span> <span class="nf">fwd</span><span class="p">(</span><span class="n">T</span><span class="o">&&</span> <span class="n">x</span><span class="p">)</span> <span class="p">{</span>
<span class="n">f</span><span class="p">(</span><span class="n">std</span><span class="o">::</span><span class="n">forward</span><span class="o"><</span><span class="n">T</span><span class="o">></span><span class="p">(</span><span class="n">x</span><span class="p">));</span>
<span class="p">}</span></code></pre></figure>
<p>doesn’t work. The failure scenarios described are due to universal references not to <code class="language-plaintext highlighter-rouge">std::forward</code> in particular.</p>
<p><em>Casr 1: Braced initializers.</em></p>
<p>This is explained in <em>Item 2</em>, the reason being that <code class="language-plaintext highlighter-rouge">T</code> cannot deduce the type of <code class="language-plaintext highlighter-rouge">std::initializer_list</code>.</p>
<p><em>Case 2: 0 or NULL as null pointers</em></p>
<p>This is explained in <em>Item 8</em> and is also related to template type deduction.</p>
<p><em>Case 3: Declaration-only integral static const and constexpr data members</em></p>
<p>This is a super specific scenario when we have:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="k">struct</span> <span class="nc">C</span> <span class="p">{</span>
<span class="k">static</span> <span class="k">constexpr</span> <span class="n">std</span><span class="o">::</span><span class="kt">size_t</span> <span class="n">k</span> <span class="o">=</span> <span class="mi">10</span><span class="p">;</span>
<span class="p">};</span>
<span class="n">ftw</span><span class="p">(</span><span class="n">C</span><span class="o">::</span><span class="n">k</span><span class="p">);</span> <span class="c1">// might fail</span></code></pre></figure>
<p>The explanation is that <code class="language-plaintext highlighter-rouge">C::k</code> doesn’t have an address in memory and since references are often implemented as pointers, this might fail in some compilers.</p>
<p>One natural question to ask is why rvalues are allowed to have references then? That’s because the compiler will create a temporary object for the rvalue which in turn has some address.</p>
<p>This temporary object creation doesn’t happen for <code class="language-plaintext highlighter-rouge">static constexpr</code>, at least for some compilers. For <code class="language-plaintext highlighter-rouge">clang</code> it works.</p>
<p><em>Case 4: Overloaded function names and template names</em>.</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="kt">void</span> <span class="nf">callback</span><span class="p">(</span><span class="kt">int</span> <span class="n">x</span><span class="p">);</span>
<span class="kt">void</span> <span class="nf">callback</span><span class="p">(</span><span class="n">std</span><span class="o">::</span><span class="n">string</span> <span class="n">x</span><span class="p">);</span>
<span class="n">fwd</span><span class="p">(</span><span class="n">callback</span><span class="p">);</span> <span class="c1">// doesn't know which overload to pick</span></code></pre></figure>
<p>This also happens if <code class="language-plaintext highlighter-rouge">callback</code> is a template function.</p>
<p><em>Case 5: Bitfields.</em></p>
<p>Bitfields allow splitting a single type into multiple variables, for example:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="k">struct</span> <span class="nc">C</span> <span class="p">{</span>
<span class="n">std</span><span class="o">::</span><span class="kt">uint32_t</span> <span class="n">field1</span><span class="o">:</span><span class="mi">10</span><span class="p">,</span> <span class="n">field2</span><span class="o">:</span><span class="mi">22</span><span class="p">;</span>
<span class="p">};</span></code></pre></figure>
<p>Here <code class="language-plaintext highlighter-rouge">field1</code> uses 10 bits and <code class="language-plaintext highlighter-rouge">field2</code> uses 22 bits from the 32 bits of <code class="language-plaintext highlighter-rouge">std::uint32_t</code>.</p>
<p>This also doesn’t work with universal references:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="n">D</span> <span class="n">d</span><span class="p">;</span>
<span class="c1">// non-const reference cannot bind to bit-field 'field2'</span>
<span class="n">f</span><span class="p">(</span><span class="n">d</span><span class="p">.</span><span class="n">field2</span><span class="p">);</span></code></pre></figure>
<h2 id="chapter-6---lambda-expressions">Chapter 6 - Lambda Expressions</h2>
<p>Things I learned from the chapter introduction:</p>
<p>The compiler creates classes for lambdas behind the scenes, called <strong>closure class</strong>. The lambda logic goes in the <code class="language-plaintext highlighter-rouge">()</code> operator, which is <code class="language-plaintext highlighter-rouge">const</code> by default. The <code class="language-plaintext highlighter-rouge">mutable</code> keyword in the lambda changes that.</p>
<p>Assigning a lambda to a variable incurs in the creation of an instance, called <strong>closure</strong>. Closures can be copied.</p>
<h3 id="item-31---avoid-default-capture-modes">Item 31 - Avoid default capture modes</h3>
<p>The item advises against using default capture by value (<code class="language-plaintext highlighter-rouge">[=]</code>):</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="k">auto</span> <span class="n">f</span> <span class="o">=</span> <span class="p">[</span><span class="o">=</span><span class="p">](...)</span> <span class="p">{</span>
<span class="p">...</span>
<span class="p">};</span></code></pre></figure>
<p>or default capture by reference:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="k">auto</span> <span class="n">f</span> <span class="o">=</span> <span class="p">[</span><span class="o">&</span><span class="p">](...)</span> <span class="p">{</span>
<span class="p">...</span>
<span class="p">};</span></code></pre></figure>
<p>Because they can cause dangling references.</p>
<h3 id="item-32---use-init-capture-to-move-objects-into-closure">Item 32 - Use init capture to move objects into closure</h3>
<p>For move-only objects like <code class="language-plaintext highlighter-rouge">std::unique_ptr</code> we can use this syntax to move the object into the closure:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="k">auto</span> <span class="n">p</span> <span class="o">=</span> <span class="n">std</span><span class="o">::</span><span class="n">make_unique</span><span class="o"><</span><span class="n">C</span><span class="o">></span><span class="p">();</span>
<span class="p">...</span>
<span class="k">auto</span> <span class="n">f</span> <span class="o">=</span> <span class="p">[</span><span class="n">p</span> <span class="o">=</span> <span class="n">std</span><span class="o">::</span><span class="n">move</span><span class="p">(</span><span class="n">p</span><span class="p">)](...)</span> <span class="p">{</span>
<span class="p">...</span>
<span class="p">};</span></code></pre></figure>
<h3 id="item-33---use-decltype-on-auto-to-stdforward-them">Item 33 - Use decltype on auto&& to std::forward them</h3>
<p>Generic lambdas are those having <code class="language-plaintext highlighter-rouge">auto</code> in their argument list:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="k">auto</span> <span class="n">f</span> <span class="o">=</span> <span class="p">[](</span><span class="k">auto</span> <span class="n">x</span><span class="p">)</span> <span class="p">{</span>
<span class="k">return</span> <span class="n">x</span><span class="p">;</span>
<span class="p">};</span></code></pre></figure>
<p>The underlying closure class is implemented using templates, possibly as:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="k">class</span> <span class="nc">Closure</span> <span class="p">{</span>
<span class="nl">public:</span>
<span class="k">template</span><span class="o"><</span><span class="k">typename</span> <span class="nc">T</span><span class="p">></span>
<span class="k">auto</span> <span class="k">operator</span><span class="p">()(</span><span class="n">T</span> <span class="n">x</span><span class="p">)</span> <span class="p">{</span>
<span class="k">return</span> <span class="n">x</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">};</span></code></pre></figure>
<p>If we want the closure to take a universal reference (<code class="language-plaintext highlighter-rouge">auto&&</code>) and forward that argument, we don’t have the template <code class="language-plaintext highlighter-rouge">T</code> available, so we can use <code class="language-plaintext highlighter-rouge">decltype</code>:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="k">auto</span> <span class="n">f</span> <span class="o">=</span> <span class="p">[](</span><span class="k">auto</span><span class="o">&&</span> <span class="n">x</span><span class="p">)</span> <span class="p">{</span>
<span class="k">return</span> <span class="n">g</span><span class="p">(</span><span class="n">std</span><span class="o">::</span><span class="n">forward</span><span class="o"><</span><span class="k">decltype</span><span class="o"><</span><span class="n">x</span><span class="o">>></span><span class="p">(</span><span class="n">x</span><span class="p">));</span>
<span class="p">};</span></code></pre></figure>
<h3 id="item-34---prefer-lambdas-over-stdbind">Item 34 - Prefer lambdas over std::bind</h3>
<p>According to this item, there’s never a reason to use <code class="language-plaintext highlighter-rouge">std::bind</code> after C++14. It claims that lambdas are more readable, expressive and can be more efficient than <code class="language-plaintext highlighter-rouge">std::bind</code>.</p>
<h2 id="chapter-7---the-concurrency-api">Chapter 7 - The Concurrency API</h2>
<h3 id="item-35---prefer-task-based-programming-to-thread-based">Item 35 - Prefer task-based programming to thread-based.</h3>
<p>In other words, prefer <code class="language-plaintext highlighter-rouge">std::async</code> to <code class="language-plaintext highlighter-rouge">std::thread</code>. Example using threads:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="cp">#include <thread>
</span>
<span class="kt">void</span> <span class="nf">f</span><span class="p">()</span> <span class="p">{</span>
<span class="n">std</span><span class="o">::</span><span class="n">cout</span> <span class="o"><<</span> <span class="s">"work"</span> <span class="o"><<</span> <span class="n">std</span><span class="o">::</span><span class="n">endl</span><span class="p">;</span>
<span class="p">}</span>
<span class="n">std</span><span class="o">::</span><span class="kr">thread</span> <span class="nf">t</span><span class="p">(</span><span class="n">f</span><span class="p">);</span>
<span class="n">t</span><span class="p">.</span><span class="n">join</span><span class="p">();</span> <span class="c1">// wait on thread to finish</span></code></pre></figure>
<p>And async:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="cp">#include <future>
</span>
<span class="k">auto</span> <span class="n">fut</span> <span class="o">=</span> <span class="n">std</span><span class="o">::</span><span class="n">async</span><span class="p">(</span><span class="n">f</span><span class="p">);</span>
<span class="n">fut</span><span class="p">.</span><span class="n">get</span><span class="p">();</span></code></pre></figure>
<p>The item suggests thread is a lower level abstraction than async, so async handles a lot of the details for you.</p>
<p>Another advantage of async is that you can get the result from the async function more easily than in a thread:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="kt">int</span> <span class="nf">g</span><span class="p">()</span> <span class="p">{</span> <span class="k">return</span> <span class="mi">1</span><span class="p">;</span> <span class="p">}</span>
<span class="k">auto</span> <span class="n">fut</span> <span class="o">=</span> <span class="n">std</span><span class="o">::</span><span class="n">async</span><span class="p">(</span><span class="n">g</span><span class="p">);</span>
<span class="k">auto</span> <span class="n">result</span> <span class="o">=</span> <span class="n">fut</span><span class="p">.</span><span class="n">get</span><span class="p">();</span>
<span class="n">std</span><span class="o">::</span><span class="n">cout</span> <span class="o"><<</span> <span class="n">result</span> <span class="o"><<</span> <span class="n">std</span><span class="o">::</span><span class="n">endl</span><span class="p">;</span> <span class="c1">// 1</span></code></pre></figure>
<h3 id="item-36---specify-stdlaunchasync-if-asynchronicity-is-essential">Item 36 - Specify std::launch::async if asynchronicity is essential</h3>
<p>In line with Item 35’s claim that async handles a lot of the details for you, one thing you can’t assume is that it will always run the callback in a separate thread. It might actually wait and run the function in the current thread.</p>
<p>To force it to run as a separate thread we must use <code class="language-plaintext highlighter-rouge">std::launch::async</code>:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="k">auto</span> <span class="n">fut</span> <span class="o">=</span> <span class="n">std</span><span class="o">::</span><span class="n">async</span><span class="p">(</span><span class="n">std</span><span class="o">::</span><span class="n">launch</span><span class="o">::</span><span class="n">async</span><span class="p">,</span> <span class="n">f</span><span class="p">);</span></code></pre></figure>
<h3 id="item-37---make-stdthread-unjoinable-on-all-paths">Item 37 - Make std::thread unjoinable on all paths</h3>
<p>A unjoinable thread is one in which the <code class="language-plaintext highlighter-rouge">.join()</code> cannot be called on. One example is when a thread has already been joined:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="n">std</span><span class="o">::</span><span class="kr">thread</span> <span class="nf">t</span><span class="p">(</span><span class="n">f</span><span class="p">);</span>
<span class="n">t</span><span class="p">.</span><span class="n">join</span><span class="p">();</span>
<span class="c1">// Exception: thread::join failed: Invalid argument</span>
<span class="n">t</span><span class="p">.</span><span class="n">join</span><span class="p">();</span></code></pre></figure>
<p>Or when the thread has been moved:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="n">std</span><span class="o">::</span><span class="kr">thread</span> <span class="nf">t</span><span class="p">(</span><span class="n">f</span><span class="p">);</span>
<span class="n">g</span><span class="p">(</span><span class="n">std</span><span class="o">::</span><span class="n">move</span><span class="p">(</span><span class="n">t</span><span class="p">));</span>
<span class="c1">// Exception: thread::join failed: Invalid argument</span>
<span class="n">t</span><span class="p">.</span><span class="n">join</span><span class="p">();</span></code></pre></figure>
<p>Or detached:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="n">std</span><span class="o">::</span><span class="kr">thread</span> <span class="nf">t</span><span class="p">(</span><span class="n">f</span><span class="p">);</span>
<span class="n">t</span><span class="p">.</span><span class="n">detach</span><span class="p">();</span>
<span class="c1">// Exception: thread::join failed: Invalid argument</span>
<span class="n">t</span><span class="p">.</span><span class="n">join</span><span class="p">();</span></code></pre></figure>
<p>If a thread is <em>joinable</em> by the time it’s destructed, the program crashes, for example:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="p">{</span>
<span class="k">auto</span> <span class="n">t</span> <span class="o">=</span> <span class="n">std</span><span class="o">::</span><span class="kr">thread</span><span class="p">([]()</span> <span class="p">{</span>
<span class="n">std</span><span class="o">::</span><span class="n">this_thread</span><span class="o">::</span><span class="n">sleep_for</span><span class="p">(</span><span class="n">std</span><span class="o">::</span><span class="n">chrono</span><span class="o">::</span><span class="n">milliseconds</span><span class="p">(</span><span class="mi">1000</span><span class="p">));</span>
<span class="p">});</span>
<span class="p">}</span> <span class="c1">// t is destroyed. Abort trap: 6</span></code></pre></figure>
<p>The recommendation of this item is to make sure threads are made unjoinable before they get destroyed.</p>
<p>As one way to achieve this, the author proposes a RAII-wrapper around <code class="language-plaintext highlighter-rouge">std::thread</code> called <code class="language-plaintext highlighter-rouge">ThreadRAII</code>, that enables configuring whether to call <code class="language-plaintext highlighter-rouge">.join()</code> or <code class="language-plaintext highlighter-rouge">.detach()</code> on the underlying thread at the <code class="language-plaintext highlighter-rouge">ThreadRAII</code>’s destructor, effectively guaranteeing a thread is never left joinable on destruction.</p>
<h3 id="item-38---be-aware-of-varying-thread-handle-destructor-behavior">Item 38 - Be aware of varying thread handle destructor behavior</h3>
<p>This item discusses the behavior of the destructor of <code class="language-plaintext highlighter-rouge">std::future</code> (the type returned by <code class="language-plaintext highlighter-rouge">std::async</code>). If it’s executed asynchronously either explicitly via the flag <code class="language-plaintext highlighter-rouge">std::launch::async</code> or implicitly (<em>Item 36</em>), the destructor behavior changes, because it calls <code class="language-plaintext highlighter-rouge">.join()</code> on the underlying thread.</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="p">{</span>
<span class="k">auto</span> <span class="n">fut</span> <span class="o">=</span> <span class="n">std</span><span class="o">::</span><span class="n">async</span> <span class="p">(</span><span class="n">std</span><span class="o">::</span><span class="n">launch</span><span class="o">::</span><span class="n">async</span><span class="p">,</span> <span class="p">[]()</span> <span class="p">{</span>
<span class="n">std</span><span class="o">::</span><span class="n">this_thread</span><span class="o">::</span><span class="n">sleep_for</span><span class="p">(</span><span class="n">std</span><span class="o">::</span><span class="n">chrono</span><span class="o">::</span><span class="n">milliseconds</span><span class="p">(</span><span class="mi">10000</span><span class="p">));</span>
<span class="p">});</span>
<span class="p">}</span> <span class="c1">// blocks</span></code></pre></figure>
<p>Note this behavior is different from when a raw <code class="language-plaintext highlighter-rouge">std::thread</code> is destroyed (<em>Item 37</em>).</p>
<h3 id="item-39---consider-void-futures-for-one-shot-event-communication">Item 39 - Consider void futures for one-shot event communication</h3>
<p>This item discusses a scenario where we have 2 threads, <code class="language-plaintext highlighter-rouge">t1</code> and <code class="language-plaintext highlighter-rouge">t2</code> and we’d like <code class="language-plaintext highlighter-rouge">t2</code> to wait for <code class="language-plaintext highlighter-rouge">t1</code> until it signals it. One way to do this is using <code class="language-plaintext highlighter-rouge">std::conditional_variable</code> + <code class="language-plaintext highlighter-rouge">std::unique_lock</code> + <code class="language-plaintext highlighter-rouge">std::mutex</code>:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="cp">#include <condition_variable>
#include <mutex>
</span>
<span class="n">std</span><span class="o">::</span><span class="n">condition_variable</span> <span class="n">cv</span><span class="p">;</span>
<span class="k">auto</span> <span class="n">t1</span> <span class="o">=</span> <span class="n">std</span><span class="o">::</span><span class="kr">thread</span><span class="p">([</span><span class="o">&</span><span class="n">cv</span><span class="p">]()</span> <span class="k">mutable</span> <span class="p">{</span>
<span class="n">std</span><span class="o">::</span><span class="n">this_thread</span><span class="o">::</span><span class="n">sleep_for</span><span class="p">(</span><span class="n">std</span><span class="o">::</span><span class="n">chrono</span><span class="o">::</span><span class="n">milliseconds</span><span class="p">(</span><span class="mi">1000</span><span class="p">));</span>
<span class="n">std</span><span class="o">::</span><span class="n">cout</span> <span class="o"><<</span> <span class="s">"setting value"</span> <span class="o"><<</span> <span class="n">std</span><span class="o">::</span><span class="n">endl</span><span class="p">;</span>
<span class="n">cv</span><span class="p">.</span><span class="n">notify_one</span><span class="p">();</span>
<span class="p">});</span>
<span class="k">auto</span> <span class="n">t2</span> <span class="o">=</span> <span class="n">std</span><span class="o">::</span><span class="kr">thread</span><span class="p">([</span><span class="o">&</span><span class="n">cv</span><span class="p">](){</span>
<span class="n">std</span><span class="o">::</span><span class="n">cout</span> <span class="o"><<</span> <span class="s">"waiting"</span> <span class="o"><<</span> <span class="n">std</span><span class="o">::</span><span class="n">endl</span><span class="p">;</span>
<span class="n">std</span><span class="o">::</span><span class="n">mutex</span> <span class="n">m</span><span class="p">;</span>
<span class="n">std</span><span class="o">::</span><span class="n">unique_lock</span> <span class="n">lk</span><span class="p">(</span><span class="n">m</span><span class="p">);</span>
<span class="n">cv</span><span class="p">.</span><span class="n">wait</span><span class="p">(</span><span class="n">lk</span><span class="p">);</span>
<span class="n">std</span><span class="o">::</span><span class="n">cout</span> <span class="o"><<</span> <span class="s">"waited"</span> <span class="o"><<</span> <span class="n">std</span><span class="o">::</span><span class="n">endl</span><span class="p">;</span>
<span class="p">});</span>
<span class="n">t1</span><span class="p">.</span><span class="n">join</span><span class="p">();</span>
<span class="n">t2</span><span class="p">.</span><span class="n">join</span><span class="p">();</span></code></pre></figure>
<p>The item argues this is hacky and suffers from issues like <code class="language-plaintext highlighter-rouge">cv.notify_one()</code> running before <code class="language-plaintext highlighter-rouge">cv.wait(lk)</code>, which causes the latter to hang. It proposes an alternative using <code class="language-plaintext highlighter-rouge">std::promise</code> + <code class="language-plaintext highlighter-rouge">std::future</code>:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="cp">#include <future>
</span>
<span class="n">std</span><span class="o">::</span><span class="n">promise</span><span class="o"><</span><span class="kt">void</span><span class="o">></span> <span class="n">p</span><span class="p">;</span>
<span class="k">auto</span> <span class="n">fut</span> <span class="o">=</span> <span class="n">p</span><span class="p">.</span><span class="n">get_future</span><span class="p">();</span>
<span class="k">auto</span> <span class="n">t1</span> <span class="o">=</span> <span class="n">std</span><span class="o">::</span><span class="kr">thread</span><span class="p">([</span><span class="n">p</span> <span class="o">=</span> <span class="n">std</span><span class="o">::</span><span class="n">move</span><span class="p">(</span><span class="n">p</span><span class="p">)]()</span> <span class="k">mutable</span> <span class="p">{</span>
<span class="n">std</span><span class="o">::</span><span class="n">this_thread</span><span class="o">::</span><span class="n">sleep_for</span><span class="p">(</span><span class="n">std</span><span class="o">::</span><span class="n">chrono</span><span class="o">::</span><span class="n">milliseconds</span><span class="p">(</span><span class="mi">1000</span><span class="p">));</span>
<span class="n">std</span><span class="o">::</span><span class="n">cout</span> <span class="o"><<</span> <span class="s">"setting value"</span> <span class="o"><<</span> <span class="n">std</span><span class="o">::</span><span class="n">endl</span><span class="p">;</span>
<span class="n">p</span><span class="p">.</span><span class="n">set_value</span><span class="p">();</span>
<span class="p">});</span>
<span class="k">auto</span> <span class="n">t2</span> <span class="o">=</span> <span class="n">std</span><span class="o">::</span><span class="kr">thread</span><span class="p">([</span><span class="n">fut</span> <span class="o">=</span> <span class="n">std</span><span class="o">::</span><span class="n">move</span><span class="p">(</span><span class="n">fut</span><span class="p">)](){</span>
<span class="n">std</span><span class="o">::</span><span class="n">cout</span> <span class="o"><<</span> <span class="s">"waiting"</span> <span class="o"><<</span> <span class="n">std</span><span class="o">::</span><span class="n">endl</span><span class="p">;</span>
<span class="n">fut</span><span class="p">.</span><span class="n">wait</span><span class="p">();</span>
<span class="n">std</span><span class="o">::</span><span class="n">cout</span> <span class="o"><<</span> <span class="s">"waited"</span> <span class="o"><<</span> <span class="n">std</span><span class="o">::</span><span class="n">endl</span><span class="p">;</span>
<span class="p">});</span>
<span class="n">t1</span><span class="p">.</span><span class="n">join</span><span class="p">();</span>
<span class="n">t2</span><span class="p">.</span><span class="n">join</span><span class="p">();</span></code></pre></figure>
<p>The major downside of this approach is that it can only be used once.</p>
<h3 id="item-40---use-stdatomic-for-concurrency-volatile-for-special-memory">Item 40 - Use std::atomic for concurrency, volatile for special memory</h3>
<p>Independent assignment like:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="n">a</span> <span class="o">=</span> <span class="n">b</span><span class="p">;</span>
<span class="n">x</span> <span class="o">=</span> <span class="n">y</span><span class="p">;</span></code></pre></figure>
<p>Can be re-ordered either by the compiler or by the underlying hardware to improve efficiency. This poses a problem for concurrent programming because we might use an independent variable to indicate some computation has taken place:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="kt">bool</span> <span class="n">isReady</span> <span class="p">{</span><span class="nb">false</span><span class="p">};</span>
<span class="k">auto</span> <span class="n">result</span> <span class="o">=</span> <span class="n">compute</span><span class="p">();</span>
<span class="c1">// indicates that computation has taken place</span>
<span class="n">isReady</span> <span class="o">=</span> <span class="nb">true</span><span class="p">;</span></code></pre></figure>
<p>If another thread relies on <code class="language-plaintext highlighter-rouge">isReady</code> to determine <code class="language-plaintext highlighter-rouge">compute()</code> has been run, we can’t let the compiler re-order the last two statements.</p>
<p><code class="language-plaintext highlighter-rouge">std::atomic</code> prevents that by telling the compiler: if an expression appears before a write to an <code class="language-plaintext highlighter-rouge">std::atomic</code> variable in the source code, then it has to be executed before such write in runtime.</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="n">std</span><span class="o">::</span><span class="n">atomic</span><span class="o"><</span><span class="kt">bool</span><span class="o">></span> <span class="n">isReady</span> <span class="p">{</span><span class="nb">false</span><span class="p">};</span>
<span class="k">auto</span> <span class="n">result</span> <span class="o">=</span> <span class="n">compute</span><span class="p">();</span>
<span class="n">isReady</span> <span class="o">=</span> <span class="nb">true</span><span class="p">;</span></code></pre></figure>
<p>There’s another optimization compilers can do, regarding redundant reads and writes. In the code below, the initial assignment of <code class="language-plaintext highlighter-rouge">y</code> is never used and is later overwritten:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="k">auto</span> <span class="n">y</span> <span class="o">=</span> <span class="n">x</span><span class="p">;</span>
<span class="n">compute</span><span class="p">();</span>
<span class="n">y</span> <span class="o">=</span> <span class="n">x</span><span class="p">;</span></code></pre></figure>
<p>The compiler might want to re-write this as:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="n">compute</span><span class="p">();</span>
<span class="k">auto</span> <span class="n">y</span> <span class="o">=</span> <span class="n">x</span><span class="p">;</span></code></pre></figure>
<p>However, it’s possible that <code class="language-plaintext highlighter-rouge">y</code> writes to a special memory (e.g. an external device) instead of RAM and some other system might depend on that side-effect. <code class="language-plaintext highlighter-rouge">volatile</code> prevents this optimization from happening.</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="k">volatile</span> <span class="k">auto</span> <span class="n">y</span> <span class="o">=</span> <span class="n">x</span><span class="p">;</span>
<span class="n">compute</span><span class="p">();</span>
<span class="n">y</span> <span class="o">=</span> <span class="n">x</span><span class="p">;</span></code></pre></figure>
<p>Note that this is still subject to re-ordering, so we could combine <code class="language-plaintext highlighter-rouge">volatile</code> and <code class="language-plaintext highlighter-rouge">std::atomic</code>:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="k">volatile</span> <span class="n">std</span><span class="o">:</span><span class="n">atomic</span><span class="o"><</span><span class="kt">int</span><span class="o">></span> <span class="n">y</span> <span class="o">=</span> <span class="n">x</span><span class="p">;</span>
<span class="n">compute</span><span class="p">();</span>
<span class="n">y</span> <span class="o">=</span> <span class="n">x</span><span class="p">;</span></code></pre></figure>
<h2 id="chapter-8---tweaks">Chapter 8 - Tweaks</h2>
<h3 id="item-41---consider-pass-by-value-for-copyable-parameters-that-are-cheap-to-move-and-always-copied">Item 41 - Consider pass by value for copyable parameters that are cheap to move and always copied</h3>
<p>Suppose we have a function that takes a reference and makes a copy of it internally:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="k">class</span> <span class="nc">B</span> <span class="p">{</span>
<span class="kt">void</span> <span class="n">set</span><span class="p">(</span><span class="n">C</span><span class="o">&</span> <span class="n">r</span><span class="p">)</span> <span class="p">{</span>
<span class="n">c_</span> <span class="o">=</span> <span class="n">r</span><span class="p">;</span>
<span class="p">}</span>
<span class="n">C</span> <span class="n">c_</span><span class="p">;</span>
<span class="p">};</span></code></pre></figure>
<p>We’d want to also support a rvalue reference overload to avoid making additional copies for rvalues:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="k">class</span> <span class="nc">B</span> <span class="p">{</span>
<span class="kt">void</span> <span class="n">set</span><span class="p">(</span><span class="n">C</span><span class="o">&</span> <span class="n">r</span><span class="p">)</span> <span class="p">{</span>
<span class="n">c_</span> <span class="o">=</span> <span class="n">r</span><span class="p">;</span>
<span class="p">}</span>
<span class="kt">void</span> <span class="n">set</span><span class="p">(</span><span class="n">C</span><span class="o">&&</span> <span class="n">r</span><span class="p">)</span> <span class="p">{</span>
<span class="n">c_</span> <span class="o">=</span> <span class="n">std</span><span class="o">::</span><span class="n">move</span><span class="p">(</span><span class="n">r</span><span class="p">);</span>
<span class="p">}</span>
<span class="n">C</span> <span class="n">c_</span><span class="p">;</span>
<span class="p">};</span></code></pre></figure>
<p>If <code class="language-plaintext highlighter-rouge">C</code> is cheap to move, we can simplify things and just take <code class="language-plaintext highlighter-rouge">r</code> by value:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="k">class</span> <span class="nc">B</span> <span class="p">{</span>
<span class="kt">void</span> <span class="n">set</span><span class="p">(</span><span class="n">C</span> <span class="n">r</span><span class="p">)</span> <span class="p">{</span>
<span class="n">c_</span> <span class="o">=</span> <span class="n">std</span><span class="o">::</span><span class="n">move</span><span class="p">(</span><span class="n">r</span><span class="p">);</span>
<span class="p">}</span>
<span class="n">C</span> <span class="n">c_</span><span class="p">;</span>
<span class="p">};</span></code></pre></figure>
<p>Let’s first compare this new form with the <code class="language-plaintext highlighter-rouge">set(C& r)</code> case. When we call <code class="language-plaintext highlighter-rouge">set(C r)</code> with an lvalue, we’ll copy-construct it when calling <code class="language-plaintext highlighter-rouge">set()</code>, but avoid the copy when move-assigning to <code class="language-plaintext highlighter-rouge">c_</code>. Whereas for <code class="language-plaintext highlighter-rouge">set(C& r)</code>, we avoid a copy when calling <code class="language-plaintext highlighter-rouge">set()</code> but make a copy when assigning to <code class="language-plaintext highlighter-rouge">c_</code>. Assuming moving is cheap, they incur in roughly the same cost.</p>
<p>Now compare with the <code class="language-plaintext highlighter-rouge">set(C&& r)</code> case. When we call <code class="language-plaintext highlighter-rouge">set(C r)</code> with an rvalue we’ll move-construct it when calling <code class="language-plaintext highlighter-rouge">set()</code> and we’ll move-assign to <code class="language-plaintext highlighter-rouge">c_</code>. For <code class="language-plaintext highlighter-rouge">set(C& r)</code> we’ll do two move-assigns. Again, assuming moving is cheap, they incur in roughly the same cost.</p>
<h3 id="item-42---consider-emplacement-instead-of-insertion">Item 42 - Consider emplacement instead of insertion</h3>
<p>Many STL containers support emplacement instead of insertion. For example, <code class="language-plaintext highlighter-rouge">std::vector</code> has <code class="language-plaintext highlighter-rouge">emplace_back()</code>. Emplace methods take the constructor arguments instead of object, so it avoids temporary object creation if the argument has different type but has a contructor that accepts it.</p>
<p>For example, suppose we have a class <code class="language-plaintext highlighter-rouge">C</code> that can be constructed from <code class="language-plaintext highlighter-rouge">int</code>. If we call <code class="language-plaintext highlighter-rouge">push_back(1)</code>, first we’ll create a temporary object via <code class="language-plaintext highlighter-rouge">tmp = C(1)</code> then copy it when doing <code class="language-plaintext highlighter-rouge">push_back(tmp)</code>:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="k">struct</span> <span class="nc">C</span> <span class="p">{</span>
<span class="n">C</span><span class="p">(</span><span class="k">const</span> <span class="n">C</span> <span class="o">&</span><span class="n">c</span><span class="p">)</span> <span class="p">{</span>
<span class="n">std</span><span class="o">::</span><span class="n">cout</span> <span class="o"><<</span> <span class="s">"copy-construct"</span> <span class="o"><<</span> <span class="n">std</span><span class="o">::</span><span class="n">endl</span><span class="p">;</span>
<span class="n">x_</span> <span class="o">=</span> <span class="n">c</span><span class="p">.</span><span class="n">x_</span><span class="p">;</span>
<span class="p">}</span>
<span class="n">C</span><span class="p">(</span><span class="kt">int</span> <span class="n">x</span><span class="p">)</span> <span class="p">{</span>
<span class="n">std</span><span class="o">::</span><span class="n">cout</span> <span class="o"><<</span> <span class="s">"construct from int"</span> <span class="o"><<</span> <span class="n">std</span><span class="o">::</span><span class="n">endl</span><span class="p">;</span>
<span class="n">x_</span> <span class="o">=</span> <span class="n">x</span><span class="p">;</span>
<span class="p">}</span>
<span class="kt">int</span> <span class="n">x_</span><span class="p">;</span>
<span class="p">};</span>
<span class="n">std</span><span class="o">::</span><span class="n">vector</span><span class="o"><</span><span class="n">C</span><span class="o">></span> <span class="n">v</span><span class="p">;</span>
<span class="c1">// construct from int</span>
<span class="c1">// copy-construct</span>
<span class="n">v</span><span class="p">.</span><span class="n">push_back</span><span class="p">(</span><span class="mi">1</span><span class="p">);</span></code></pre></figure>
<p>If we use <code class="language-plaintext highlighter-rouge">emplace_back(1)</code>, we’ll only call <code class="language-plaintext highlighter-rouge">C(1)</code> done inside <code class="language-plaintext highlighter-rouge">std::vector</code>:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="n">std</span><span class="o">::</span><span class="n">vector</span><span class="o"><</span><span class="n">C</span><span class="o">></span> <span class="n">v</span><span class="p">;</span>
<span class="c1">// construct from int</span>
<span class="n">v</span><span class="p">.</span><span class="n">emplace_back</span><span class="p">(</span><span class="mi">1</span><span class="p">);</span></code></pre></figure>
<h2 class="no_toc" id="conclusion">Conclusion</h2>
<p>I really liked reading this book cover-to-cover and learned a lot! The book is rather verbose but it has a very fluid narrative and thus is smooth to read.</p>
<p>Despite the verbosity, the book has a lot of content and I had trouble summarizing, even leaving out the rationale for the recommendation. The markdown text for the post has over 1k lines (usually it’s fewer than 200).</p>
<p>The book also manages to make the content accessible but also detailed and technically precise. One downside is that at times the author spends a lot of time discussing what it seems like an extreme corner case, for example, in <em>Item 27</em> (on avoiding overloaded universal references), the section called “Constraining templates that take universal references”.</p>
<h2 class="no_toc" id="related-posts">Related Posts</h2>
<ul>
<li><a href="https://www.kuniga.me/blog/2021/07/02/namespace-jail.html">Namespace Jailing</a> - In that post we used thread syncrhonization using pipes, which could is another alternative for <em>Item 39</em>.</li>
<li><a href="https://www.kuniga.me/blog/2022/03/01/moving-semantics-cpp.html">Move Semantics in C++</a> - As discussed above, that post overlaps in content with <em>Item 23</em>.</li>
<li><a href="https://www.kuniga.me/blog/2022/06/10/smart-pointers-cpp.html">Smart Pointers in C++</a> - As discussed above, that post overlaps in content with <em>Item 18</em>, <em>Item 19</em> and <em>Item 21</em>.</li>
</ul>
<h2 class="no_toc" id="references">References</h2>
<ul>
<li>[1] Effective Modern C++, Scott Meyers.</li>
</ul>Guilherme KunigamiIn this post I’ll share my notes on the book Effective Modern C++ by Scott Meyers. As with Effective C++, Meyers’ book is organized around items. Each item title describes specific recommendations (e.g. “Prefer nullptr to 0 and NULL”) and then it delves into the rationale, while also explaning details about the C++ language. The post lists each item with a summary and my thoughts when applicable. The goal is that it can be an index to find more details on the book itself. The Modern in the title refers to C++11 and C++14 features. This book is a complement to Effective C++, not an updated edition.Paper Reading - Photon2022-09-27T00:00:00+00:002022-09-27T00:00:00+00:00https://www.kuniga.me/blog/2022/09/27/photon<!-- This needs to be define as included html because variables are not inherited by Jekyll pages -->
<p>In this post we’ll discuss the paper <em>Photon: Fault-tolerant and Scalable Joining of Continuous Data Streams</em> by Ananthanarayanan et al. [1]. This 2013 paper described the Photon system developed at Google used for associating search queries and ads clicks with low-latency.</p>
<p>This system supports millions of events per second, exactly-once semantics and out-of-order events, with P90 latency less than 7s.</p>
<!--more-->
<h2 id="problem-statement">Problem Statement</h2>
<p>When a user performs a search query on Google, it displays some ads along the results. When a user clicks on one of such ads, advertisers might want to correlate it with the original query (e.g. what search terms were used).</p>
<p>Logging the search query metadata with the ad click is too expensive, so instead the system only logs an ID associated with the query ID and when needed it joins with the query logs via this ID.</p>
<p>There are two log streams, one representing the search queries and the other the ads clicks. The query has a corresponding query ID and so does the click. Along with the query ID, the streams also store the hostname, process ID and timestamp of the query event, which are needed for imlpementing some operations more efficiently.</p>
<h2 id="algorithm">Algorithm</h2>
<p>I find it instructive to try to solve the problem at a small scale first and have this as a high-level picture of what a more complex system is implementing.</p>
<p>We can basically imagine we’re given a list <code class="language-plaintext highlighter-rouge">clicks</code> and a hash table mapping query IDs to queries. Our goal is to find a corresponding <code class="language-plaintext highlighter-rouge">query</code> for each click, join them and write to some output.</p>
<p>Due to the out-of-order nature of events, it is possible that when a <code class="language-plaintext highlighter-rouge">click</code> is processed by the system the corresponding <code class="language-plaintext highlighter-rouge">query</code> hasn’t made to the logs yet (even though the query event has to happen before the click event). Thats why we have a loop to retry a few types. Implicity in <code class="language-plaintext highlighter-rouge">wait()</code> is an exponential-backoff.</p>
<p>Two Photon workers execute the same set of events to increase fault-tolerance, but to avoid them joining the same event twice, they coordinate via the <code class="language-plaintext highlighter-rouge">id_registry</code>, a hashtable-like <code class="language-plaintext highlighter-rouge">id_registry</code> which is used to determine which <code class="language-plaintext highlighter-rouge">query_id</code>s have already been joined.</p>
<p>It’s possible that they’re both processing the same <code class="language-plaintext highlighter-rouge">click</code> event at the same time while waiting for <code class="language-plaintext highlighter-rouge">query</code> so we need to do the <code class="language-plaintext highlighter-rouge">id_registry</code> look up inside the loop. We’re omitting more granular concurrency primitives in the code below but for simplicity let’s assume it is thread-safe.</p>
<p>If a <code class="language-plaintext highlighter-rouge">query</code> hasn’t been found for a <code class="language-plaintext highlighter-rouge">click</code>, the latter is saved in <code class="language-plaintext highlighter-rouge">unjoined</code> list for later processing.</p>
<figure>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="k">for</span> <span class="n">click</span> <span class="ow">in</span> <span class="n">clicks</span><span class="p">:</span>
<span class="n">query_id</span> <span class="o">=</span> <span class="n">click</span><span class="p">.</span><span class="n">get_query_id</span><span class="p">()</span>
<span class="n">click_id</span> <span class="o">=</span> <span class="n">click</span><span class="p">.</span><span class="n">get_id</span><span class="p">()</span>
<span class="k">while</span> <span class="n">max_retries</span> <span class="o">></span> <span class="mi">0</span><span class="p">:</span>
<span class="k">if</span> <span class="n">click_id</span> <span class="ow">in</span> <span class="n">id_registry</span><span class="p">:</span>
<span class="k">break</span>
<span class="n">query</span> <span class="o">=</span> <span class="n">queries</span><span class="p">.</span><span class="n">get</span><span class="p">(</span><span class="n">query_id</span><span class="p">)</span>
<span class="k">if</span> <span class="n">query</span><span class="p">:</span>
<span class="k">break</span>
<span class="n">wait</span><span class="p">()</span>
<span class="n">max_retries</span> <span class="o">-=</span> <span class="mi">1</span>
<span class="k">if</span> <span class="n">query</span><span class="p">:</span>
<span class="n">event</span> <span class="o">=</span> <span class="n">join</span><span class="p">(</span><span class="n">click</span><span class="p">,</span> <span class="n">query</span><span class="p">)</span>
<span class="k">if</span> <span class="n">click_id</span> <span class="ow">not</span> <span class="ow">in</span> <span class="n">id_registry</span><span class="p">:</span>
<span class="n">id_registry</span><span class="p">.</span><span class="nb">set</span><span class="p">(</span><span class="n">click_id</span><span class="p">)</span>
<span class="n">write</span><span class="p">(</span><span class="n">event</span><span class="p">)</span>
<span class="k">else</span><span class="p">:</span>
<span class="n">unjoined</span><span class="p">.</span><span class="n">push</span><span class="p">(</span><span class="n">click</span><span class="p">)</span></code></pre></figure>
<figcaption>Algorithm 1: Pseudo-code for the join</figcaption>
</figure>
<p>The algorithm is relatively straightforward but to make sure it can be scaled we need a more complex system. Let’s now take a look at the achitecture.</p>
<h2 id="architecture">Architecture</h2>
<h3 id="overview">Overview</h3>
<p>The major components of photon are the <code class="language-plaintext highlighter-rouge">IdRegistry</code>, <code class="language-plaintext highlighter-rouge">Dispatcher</code>, <code class="language-plaintext highlighter-rouge">Joiner</code> and <code class="language-plaintext highlighter-rouge">EventStore</code> and are depicted in Figure 1. The dashed components are external to Photon. We’ll go over each of them in detail.</p>
<figure class="center_children">
<img src="https://www.kuniga.me/resources/blog/2022-09-27-photon/architecture.png" alt="." />
<figcaption>Figure 1: Photon System Architecure (source: [1])</figcaption>
</figure>
<h3 id="idregistry">IdRegistry</h3>
<p>The <code class="language-plaintext highlighter-rouge">IdRegistry</code> implements a distributed hash-table. It consists of servers replicated in multiple geographic regions, each of which stores a copy of the data as a in-memory key-value store.</p>
<p>One of the replicas is elected master. It’s not explicitly said in the paper but I assume writes can only go to the master and the other replicas are eventually made consistent. Master election and eventual consistency are achieved using the Paxos protocol.</p>
<p><strong>Batching.</strong> To avoid the overhead of running a Paxos protocol on every write, the <code class="language-plaintext highlighter-rouge">IdRegistry</code> batches multiple writes together. One detail to handle is when there are multiple writes to the same <code class="language-plaintext highlighter-rouge">click_id</code> in the batch. All writes but the first must report a failure to the requester while the batch is being constructed.</p>
<p><strong>Sharding.</strong> To increase throughput of the <code class="language-plaintext highlighter-rouge">IdRegistry</code>, Photon uses sharding. Suppose there were $m$ replicas to begin with. Now we scale up each replica to $n$ shards, for a total of $n \cdot m$ servers.</p>
<p>There are $n$ masters to write to, and for each master there are $m-1$ slaves. Each master now handles a subset of the total clicks. More specifically, the $i$-th master handles the clicks such that $\mbox{click_id} \equiv i \pmod n$.</p>
<p>Due to re-scaling, $n$ might change. Suppose a rescaling happens at time $t$. This could happen:</p>
<ul>
<li>Before time $t$: <code class="language-plaintext highlighter-rouge">click_id</code> written</li>
<li>Time $t$: rescaling, $n$ changes</li>
<li>After time $t$: <code class="language-plaintext highlighter-rouge">click_id</code> checked for</li>
</ul>
<p>In this case we could end up double joining the event, violating the exact-once guarantee.</p>
<p>To avoid this problem in addition to <code class="language-plaintext highlighter-rouge">click_id</code>, the <code class="language-plaintext highlighter-rouge">IdRegistry</code> also receives the click event time <code class="language-plaintext highlighter-rouge">click_time</code>. If $n$ was changed to $n’$ at time $t$, then in theory the hashing logic could be: if $\mbox{click_time} \ge t$, then the shard is calculated using $n’$, otherwise $n$.</p>
<p>The history of changes of $n$ needs to be stored at each <code class="language-plaintext highlighter-rouge">IdRegistry</code> and it must be consistent. Due to latency in replication and achieving consistency, it could be that by the time an event with $\mbox{click_time} \ge t$ is processed, the <code class="language-plaintext highlighter-rouge">IdRegistry</code> doesn’t have the latest $n$ change, so it would shard the event incorrectly.</p>
<p>To account for that, the rule adds a buffer such that only if $\mbox{click_time} \ge t + \delta_t$ then the shard is calculated using $n’$. The paper [1] gives a different reason for this time buffer - local clock skew - but I failed to see why that matters, since even with clock skew the <code class="language-plaintext highlighter-rouge">click_time</code> is fixed given a fixed <code class="language-plaintext highlighter-rouge">click_id</code>.</p>
<h3 id="dispatcher">Dispatcher</h3>
<p>The dispatcher is responsible for consuming data from the click logs. It consults the <code class="language-plaintext highlighter-rouge">IdRegistry</code> to achieve at-most-once semantics and does retries to achieve at-least-once semantics.</p>
<p>It runs several processes in parallel to read from the logs, each of which keeps track of offset for the reads. The list of offsets is shared among all processes and persisted to disk for recovery. It also persists to disk the click events that must be retried.</p>
<p>This means that upon failures it can resume from the point it was before the crash.</p>
<h3 id="joiner">Joiner</h3>
<p>The joiner receives requests from the dispatcher, containing the click event. It extracts the <code class="language-plaintext highlighter-rouge">query_id</code> from it and sends to the <code class="language-plaintext highlighter-rouge">EventStore</code> service to look up the corresponding query.</p>
<p>In case the query is not found, the joiner returns a fail response which will cause the dispatcher to retry. The joiner also does throttling: if there are too many inflight requests to the <code class="language-plaintext highlighter-rouge">EventStore</code> it returns a fail response.</p>
<p>The joiner then calls an application specific function called <code class="language-plaintext highlighter-rouge">adapter</code> which takes the <code class="language-plaintext highlighter-rouge">query</code> and <code class="language-plaintext highlighter-rouge">click</code> events and returns a new event.</p>
<p>Finally the joiner looks up <code class="language-plaintext highlighter-rouge">click_id</code> in the <code class="language-plaintext highlighter-rouge">IdRegistry</code> and if none is found, it sets the <code class="language-plaintext highlighter-rouge">click_id</code> and writes the event to an output log. If the <code class="language-plaintext highlighter-rouge">IdRegistry</code> has the <code class="language-plaintext highlighter-rouge">click_id</code>, it simply ignores the event. In either case it returns a success response.</p>
<p><strong>Failure scenarios.</strong> The operations of sending a write request to <code class="language-plaintext highlighter-rouge">IdRegistry</code>, getting a response and writing the output are not atomic so there are two main failure scenarios.</p>
<p>The <code class="language-plaintext highlighter-rouge">IdRegistry</code> might have succesfully written the <code class="language-plaintext highlighter-rouge">click_id</code> but its response was lost and not received by the joiner, so the joiner cannot tell if it succeeded. The joiner can then retry the request, but then the <code class="language-plaintext highlighter-rouge">IdRegistry</code> needs to know it’s a retry from the same client whose write has already succeeded.</p>
<p>The joiner can send identifying metadata including hostname, process ID and timestamp, which the <code class="language-plaintext highlighter-rouge">IdRegistry</code>, a hash table, uses as the value for the key <code class="language-plaintext highlighter-rouge">click_id</code>. So if a retry comes where hostname and process ID match an existing entry and the difference between timestamp is small, it will identify it as a retry and accept the write.</p>
<p>Another possibile failure is that the joiner crashes after writing to the <code class="language-plaintext highlighter-rouge">IdRegistry</code> but before it writes the output. The paper suggests limiting the inflight requests from a given joiner to the <code class="language-plaintext highlighter-rouge">IdRegistry</code> to minimize the blast radius and empirically this happens very infrequently, with 0.0001% of the events.</p>
<p><strong>Recovery.</strong> In the second failure case, the system periodically scans the <code class="language-plaintext highlighter-rouge">IdRegistry</code> for entries that don’t have a corresponding write to the output. If it finds one, it simply deletes the entry from the <code class="language-plaintext highlighter-rouge">IdRegistry</code> and re-queues in the dispatcher for re-processing.</p>
<p>The entries that do have a corresponding write in the output are also periodically removed so reduce the size of the <code class="language-plaintext highlighter-rouge">IdRegistry</code>.</p>
<h3 id="eventstore">EventStore</h3>
<p>The <code class="language-plaintext highlighter-rouge">EventStore</code> is responsible for returning the data for a query given a <code class="language-plaintext highlighter-rouge">query_id</code>. It consists of two parts: <code class="language-plaintext highlighter-rouge">CachedEventStore</code> and <code class="language-plaintext highlighter-rouge">LogsEventStore</code>.</p>
<p>The <code class="language-plaintext highlighter-rouge">CachedEventStore</code> is a in-memory distributed key-value store and serves as cache. It is sharded using consistent hashing, where the hash is based on the <code class="language-plaintext highlighter-rouge">query_id</code>.</p>
<p>It implements a LRU cache and is populated by a process that reads the query logs sequentially. The cache is typically able to hold entries for the past few minutes which works for most cases since usually the click and query events are within a short time interval of each other.</p>
<p>One interesting fact is that most of the entries are never read since most of queries don’t have a corresponding ad click, so are never joined.</p>
<p>The cache hit rate based on measurements is between 75-85% depending on traffic. In case of a cache miss, the search falls back to <code class="language-plaintext highlighter-rouge">LogsEventStore</code>.</p>
<p>The <code class="language-plaintext highlighter-rouge">LogsEventStore</code> is a key-value structure where the key is a composite of process ID, hostname and timestamp. The value is the log filename + offset. A process reads the query logs sequentially and at intervals it writes an entry to the key-value structure. It doesn’t do it for every query to keep the table size contained.</p>
<p>The click event contains the information about the query process ID, hostname and timestamp. It can then send a request and the <code class="language-plaintext highlighter-rouge">LogsEventStore</code> will find the entry matching the process ID and hostname and the closest timestamp.</p>
<p>With the log filename and offset, it can perform a seach within the file for the corresponding query ID. Since the entries within a log file are approximately sorted by timestamp, the look up is efficient.</p>
<p>It’s possible to do a tradeoff on the size of the <code class="language-plaintext highlighter-rouge">LogsEventStore</code> lookup table and the amount of scan needed by controlling the granularity of the timestamps in the keys.</p>
<h2 id="experiments">Experiments</h2>
<p>The paper provides several results to demonstrate the non-functional aspects of the system including:</p>
<ul>
<li>End-to-end latency: P90 is less than 7 seconds.</li>
<li><code class="language-plaintext highlighter-rouge">CachedEventStore</code> hit rate: between 75-85%.</li>
<li>Resilience over one data center failure.</li>
<li>Performance of batching on <code class="language-plaintext highlighter-rouge">IdRegistry</code>: QPS is 6-12 for batched vs 200-350 for unbatched.</li>
<li>Auto-balancing of load upon resharding</li>
<li>Duplication of work: less than 5% of events processed by two processors.</li>
</ul>
<h2 id="conclusion">Conclusion</h2>
<p>Photon is described as a system to solve a very specific problem: joining queries with ad clicks. It can be generalized to other applications as hinted by the <em>adapter</em> function.</p>
<p>It does however implement a very specific type of join which I’ve seen being called a <em>quick join</em>: it assumes an event from the main stream (clicks) is joined exactly once. In some other types of joins an event might be joined zero or multiple times as well.</p>
<p>I was initially thinking that the <code class="language-plaintext highlighter-rouge">IdRegistry</code> could use consitent hashing to avoid issues with re-sharding but it doesn’t work. <code class="language-plaintext highlighter-rouge">IdRegistry</code> is not a best-effort system like a cache and thus can’t affort cache misses, so any re-sharding needs to be “back compatible” which makes it a lot more complicated.</p>
<p>I liked the level of technical details described to address uncommon issues like events being dropped due to failures.</p>
<p>I found interesting that Photo has two workers working on the <em>exact</em> same shards in parallel for fault-tolerance. I don’t recall having seen this design before.</p>
<h2 id="related-posts">Related Posts</h2>
<ul>
<li><a href="https://www.kuniga.me/blog/2022/07/26/review-streaming-systems.html">Review: Streaming Systems</a> - Chapter 9 talks about streaming joins more generally in less detail.</li>
<li><a href="https://www.kuniga.me/blog/2019/04/12/consistent-hashing.html">Consistent Hashing</a> - discusses consistent hashing which is utilized by Photon.</li>
<li><a href="https://www.kuniga.me/blog/2014/04/14/the-paxos-protocol.html">The Paxos Protocol</a> - discusses the Paxos prototol which Photon utilizes indirectly via PaxosDB.</li>
<li><a href="https://www.kuniga.me/blog/2017/04/27/paper-reading-spanner-google's-globally-distributed-database.html">Paper Reading - Spanner</a> - discusses the <code class="language-plaintext highlighter-rouge">TrueTime</code> API which is leveraged by Photon to bound clock drift across the hosts.</li>
</ul>
<h2 id="references">References</h2>
<ul>
<li>[<a href="https://research.google/pubs/pub41318/">1</a>] Photon: Fault-tolerant and Scalable Joining of Continuous Data Streams - Ananthanarayanan et al.</li>
</ul>Guilherme KunigamiIn this post we’ll discuss the paper Photon: Fault-tolerant and Scalable Joining of Continuous Data Streams by Ananthanarayanan et al. [1]. This 2013 paper described the Photon system developed at Google used for associating search queries and ads clicks with low-latency. This system supports millions of events per second, exactly-once semantics and out-of-order events, with P90 latency less than 7s.Paper Reading - State Management In Apache Flink2022-08-18T00:00:00+00:002022-08-18T00:00:00+00:00https://www.kuniga.me/blog/2022/08/18/state-management-flink<!-- This needs to be define as included html because variables are not inherited by Jekyll pages -->
<p>In this post we’ll discuss the paper <em>State Management In Apache Flink: Consistent Stateful Distributed Stream Processing</em> by Carbone et al. [1]. This paper goes over the details of how state is implemented in Flink, an open-source distributed stream processing system. Great emphasis is given on fault-tolerance and reconfiguration mechanisms using snapshots.</p>
<!--more-->
<h2 id="flink-architecture-overview">Flink Architecture Overview</h2>
<p>Flink has a programmatic API for specifying pipelines like FlumeJava [2] (in either Java, Python and Scala). The user writes the business logic using this API which gets converted into a logical graph.</p>
<p>This logical graph gets optimized on the client (e.g. operator fusion like in FlumeJava [2]) and sent to Flink’s runtime, where it gets converted to a physical graph composed of operators (e.g. source, mapper, reducer, sink). Each operator has one or more tasks that execute the actual work.</p>
<p>The system is composed of a set of machines, each running a <em>task manager</em>, which coordinates the tasks executing work in that host. There’s a central host that orchestrates the entire application, the <em>job manager</em>. Communication between task managers and the job manager happen via RPC.</p>
<p>To achieve high-availability with a centralized job manager, Flink keeps a set of stand-by replicas which can be promoted to leader in case of leader failures. The leader selection is achieved using Zookeeper [3].</p>
<p><em>Figure 1</em> shows a diagram with these components.</p>
<figure class="center_children">
<img src="https://www.kuniga.me/resources/blog/2022-08-18-state-management-flink/architecture.png" alt="Diagram with many boxes. Zookeeper, Job Manager, Client, 2 Task Managers, Physical Tasks, Snapshot Store and Local Snapshots." />
<figcaption>Figure 1: Flink System Architecure (source: [1])</figcaption>
</figure>
<h2 id="state">State</h2>
<h3 id="managed-state">Managed State</h3>
<p>We can divide state into two types based on scope: <strong>key</strong> and <strong>operator</strong>. The key scope state is per each key. For example, if we perform a group by key (say user ID) and then do an aggregation, the state storing the aggregated value is specific to a given key.</p>
<p>An operator scope state is a higher-level scope when it’s not specific to any task in particular. One example is the state needed to store the offsets when reading from Kafka. It pertains to the source operator as a whole, and is needed for checkpointing, i.e., upon recovery we want to resume reading from Kafka at a specific point in time.</p>
<h3 id="state-partitioning">State Partitioning</h3>
<p>Keys are grouped into units called <strong>key groups</strong>. The number of key groups is fixed while the number of keys is application dependent.</p>
<p>Each group is the atomic in regards to a task assignment. This means that not only a task handles all keys from the group but upon reconfiguration (say adding new machines), the keys are re-assigned as a unit to another task.</p>
<p>This also makes key-state cheaper to access: keys from the same group can be stored sequentially/together so we reduce the lookups needed when the state is stored remotely.</p>
<p>There are more key groups than tasks, so a given task handles multiple key groups. The assignment is done contiguously to optimize read patterns (i.e. reduce seeks). So for example, if key groups are numbered $1$ to $K$, then each task gets the key groups from $\lceil i \frac{K}{T} \rceil$ to $\lfloor (i + 1) \frac{K}{T} \rfloor$ where $T$ is the total number of tasks.</p>
<h2 id="snapshots">Snapshots</h2>
<p>A data stream within Flink is conceptually divided in segments based on processing time (see [4] for terminology). More precisely each element belongs to a set which is identified by a timestamp and thus called <strong>epoch</strong>. This grouping is orthogonal to any application logic, including windowing.</p>
<p>The system interleaves epoch markers with the events so it gets propagated through the pipeline. Since the segmentation is based on processing time, streams processed in parallel belong to the same epoch. <em>Figure 2</em> makes it much clearer.</p>
<p>Epochs are useful for the system to determine when to perform snapshots. Snapshots are taken at each task. When it observes an epoch marker $n$, it persists the current state adding to a centralized snapshot storage (distributed file system). When each task has snapshotted its epoch $n$, the system can complete the centralized snapshot for epoch $n$. If a failure happens, the system can restart the state from the complete snapshot with highest epoch.</p>
<p>It’s worth noting that a task doesn’t need to wait for another while taking partial snapshots: it’s possible a task has already created snapshot for say, epoch $n + 2$, while some tasks are still to snapshot epoch $n$.</p>
<figure class="center_children">
<img src="https://www.kuniga.me/resources/blog/2022-08-18-state-management-flink/snapshot.png" alt="On request" />
<figcaption>Figure 2: Snapshot Example. Different background colors represent different epochs (source: [1])</figcaption>
</figure>
<p>To me, epoch sounds very similar to watermarks [4]. A watermark is a timestamp $w$ that says “I’ve seen all data with timestamps less than $w$”, whereas an epoch is a timestamp $e$ that says “I’ve snapshotted all data with timestamps less than $e$”.</p>
<h3 id="epoch-alignment">Epoch Alignment</h3>
<p>When two or more streams are merged like in $t3$ and $t5$ in <em>Figure 2</em>, tasks need to align the epoch markers of these streams. They can stop processing inputs from the stream that is ahead until the others catch up.</p>
<p>We can assume each stream implements an interface <code class="language-plaintext highlighter-rouge">Stream</code> containing <code class="language-plaintext highlighter-rouge">block()</code> which makes the stream stop reading data from the source and <code class="language-plaintext highlighter-rouge">unblock()</code> which reverts the block. The stream also implements a queue-like interface, with <code class="language-plaintext highlighter-rouge">send(event)</code> which causes some data/marker to be added to the end of the stream and <code class="language-plaintext highlighter-rouge">get()</code> to read an event at the front.</p>
<p><em>Algorithm 1</em> in Python below describes the alignment in each task, assuming <code class="language-plaintext highlighter-rouge">input_streams</code> and <code class="language-plaintext highlighter-rouge">output_streams</code> are of type <code class="language-plaintext highlighter-rouge">List[Stream]</code>:</p>
<figure>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="n">blocked</span> <span class="o">=</span> <span class="nb">set</span><span class="p">()</span>
<span class="n">marker</span> <span class="o">=</span> <span class="bp">None</span>
<span class="k">while</span> <span class="bp">True</span><span class="p">:</span>
<span class="k">for</span> <span class="n">input_stream</span> <span class="ow">in</span> <span class="n">input_streams</span><span class="p">:</span>
<span class="n">event</span> <span class="o">=</span> <span class="n">input_stream</span><span class="p">.</span><span class="n">get</span><span class="p">()</span>
<span class="k">if</span> <span class="n">input_stream</span> <span class="ow">not</span> <span class="ow">in</span> <span class="n">blocked</span> <span class="ow">and</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">event</span><span class="p">,</span> <span class="n">EpochMarker</span><span class="p">):</span>
<span class="n">input_stream</span><span class="p">.</span><span class="n">block</span><span class="p">()</span>
<span class="n">blocked</span><span class="p">.</span><span class="n">add</span><span class="p">(</span><span class="n">input_stream</span><span class="p">)</span>
<span class="n">marker</span> <span class="o">=</span> <span class="n">event</span>
<span class="k">if</span> <span class="n">blocked</span> <span class="o">==</span> <span class="n">input_streams</span><span class="p">:</span>
<span class="k">for</span> <span class="n">each</span> <span class="n">output_stream</span> <span class="ow">in</span> <span class="n">output_streams</span><span class="p">:</span>
<span class="n">output</span><span class="p">.</span><span class="n">send</span><span class="p">(</span><span class="n">marker</span><span class="p">)</span>
<span class="n">trigger_snapshot</span><span class="p">()</span>
<span class="k">for</span> <span class="n">each</span> <span class="n">input_stream</span> <span class="ow">in</span> <span class="n">blocked</span><span class="p">:</span>
<span class="n">input_stream</span><span class="p">.</span><span class="n">unblock</span><span class="p">()</span>
<span class="n">blocked</span> <span class="o">=</span> <span class="nb">set</span><span class="p">()</span></code></pre></figure>
<figcaption>Algorithm 1: Epoch alignment</figcaption>
</figure>
<h3 id="cyclic-graphs">Cyclic Graphs</h3>
<p>Flink supports cyclics graphs for interactive computations, so snapshotting must support this topology.</p>
<p>To implement this, Flink adds two special tasks, <em>IterationHead</em> and <em>IterationTail</em> which are co-located and share memory. For example, given a cycle $A \rightarrow B \rightarrow A$ the special tasks ($t$ for tail and $h$ for head) are inserted somewhere in the cycle like $A \rightarrow B \rightarrow t \rightarrow h \rightarrow A$.</p>
<p>Once the header task $h$ processes the epoch mark $n-1$, it will <em>emit</em> an epoch mark $n$. Once the tail task $t$ receives that epoch mark, it can be sure $h$ has processed all the events prior to the epoch mark $n$.</p>
<p><em>Figure 3.</em> illustrates this idea. Notice that upstream tasks will also send their own epoch mark $n$. The bottom-left task will perfom alignment for its two input streams. It’s unclear to my why we need two nodes instead of just <em>IterationHead</em>.</p>
<figure class="center_children">
<img src="https://www.kuniga.me/resources/blog/2022-08-18-state-management-flink/cycle.png" alt="On request" />
<figcaption>Figure 3: Snapshot for cycles. a) header task emits marker n. b) after alignment, bottom-left task emits the marker n. c) bottom-right task emits marker n to its two sinks. d) marker n reaches back at the tail, snapshot is taken (source: [1])</figcaption>
</figure>
<h3 id="rollback">Rollback</h3>
<p>The rollback operation consists in reseting the application to a state corresponding to a snapshot. This operation is required in case of failures, topology changes (e.g. re-scaling) or application changes (e.g. user changed logic of the program).</p>
<p>It’s worth noting that the snapshot also contains metadata about the tasks configuration, including keys partitions and offsets for the input sources.</p>
<p>Let’s consider the example of rollback due to topology changes, in particular <em>scaling out</em>, i.e. adding more machines to the job to increasong processing, as shown in <em>Figure 4</em>.</p>
<figure class="center_children">
<img src="https://www.kuniga.me/resources/blog/2022-08-18-state-management-flink/rescaling.png" alt="On request" />
<figcaption>Figure 4: Scaling out: increasing the parallelism of the job after checkpointing. Source: [5]</figcaption>
</figure>
<p>The key to support re-assigning state upon restoring from a checkpoint is to write the checkpoint with high granularity. More concretely, when a task saves a snapshot, for example some aggregation by key, it doesn’t do it as a monolith but rather as a collection at the key group level of granularity.</p>
<p>Thus when the system has to re-assign tasks it can look at the unified collection of aggregations across all tasks and redistribute it as it seems fit.</p>
<h2 id="implementation-details">Implementation Details</h2>
<h3 id="state-storage">State Storage</h3>
<p>The snapshots must be persisted to some sort of database, which the paper calls <em>state backend</em>, and divides it into <em>local state backend</em> and <em>external state backend</em>.</p>
<p>In the local state case the data is stored locally, either in memory or out-of-core (can write to disk) using an embedded key-value database such as RocksDB. This enables data locality and doesn’t require coordination across multiple machines.</p>
<p>The external state can be further divided into <em>non-MVCC</em> and <em>MVCC-enabled</em> backends, where MVCC stands for Multi-Version Concurrency Control. For the non-MVCC case Flink uses a Two-Phase commit protocol with job manager as coordinator.</p>
<p>Individually, each task logs events to a local file (write-ahead log) and when a snapshot is requested, it sends a commit (“yes” vote) to the coordinator. Once all tasks submit their commits, the job manager commits the entire state atomically to the external DB.</p>
<p>MVCC-enabled allows commiting to different versions, so Flink maps epoch to these versions. This way task-level can commit snapshots without any explicit coordination. When all tasks have comitted, the external DB will update its current version.</p>
<p><strong>Local vs external.</strong> While local state backends are simpler to write, they can be more difficult to read from during rollback, since the system needs to fetch the snapshot distributed accross all the machines. The paper suggests external backends is preferred when the state is large.</p>
<h3 id="asynchronous-and-incremental-snapshots">Asynchronous and Incremental Snapshots</h3>
<p>The paper doesn’t provide much detail on this, but [6] does, so we use it to complement the discussion. Let’s focus on the out-of-core, local state backend case, in particular using RocksDB.</p>
<p>RocksDB is a key-value store which keeps a hash table in memory (called <em>memtable</em>). When the memtable gets too big, it is persisted to disk and no further state updates in done there and is now known as <em>sstable</em>.</p>
<p>Asynchronously, RocksDB merges two or more sstables into one to reduce the number of sstables in a process known as <em>compaction</em>. If the same key appears in multiple sstables, the one from the most recent sstable takes precedence. RocksDB thus implement a data structure called <em>Log Structured Merge Trees</em> or LSM, which we’ve discussed in the past [7].</p>
<p>Once the task running an instance of RocksDB receives a snapshot request, it first tells RocksDB to flush its memtable to disk and this is the only part done synchronously. Then it writes all the sstables since the last snapshot to a distributed file storage. By doing this, Flink is only storing the delta of sstables (i.e. incremental snapshotting). When it comes time to restore the state however, it needs to go over multiple snapshots to combine the data from all sstables.</p>
<p>One risk is that RocksDB will merge a sstable with another that has already been written to the distributed file storage (or snashotted), which would break the invariants regarding the incremental snapshots. To prevent this, Flink tells RocksDB to not merge sstables that haven’t been snapshotted yet.</p>
<h3 id="queryable-state">Queryable State</h3>
<p>Flink exposes a read-only view of its internal state to the application. One use case is to reduce latency in obtaining results, since by querying local state directly it can query fresh partial results. Flushing partial results frequently to the sinks would be prohibitive.</p>
<p>Each host running a task has a server that is capable of receiving requests from the application. The request includes the key to be looked up and the server returns the corresponding value. To reach the task containing the right set of keys, the request is first sent to the job manager which has the metadata to do the proper routing.</p>
<h3 id="exactly-once-output">Exactly-Once Output</h3>
<p>Because of recomputation, Flink might end up sending duplicated data to its sinks. To achieve exactly-once semantics it needs to do some extra work depending on the semantics provided by the sinks.</p>
<p><strong>Idempotent sinks.</strong> can handle duplicated data, so no extra work is needed on Flink’s side. One example of a idempotent sink is a key-value store in which writing the same key-value one or multiple times has effectivelly no difference.</p>
<p><strong>Transactional sinks.</strong> In case the underlying sink isn’t idempotent, Flink has to perform some sort of transaction so that it doesn’t output values until a snapshot is taken.</p>
<h2 id="experiments">Experiments</h2>
<p>The authors describe a real-world use case King.com and experiment with a few parameters to understand how snapshotting affects the pipeline performance.</p>
<p>They show that the time to snapshot increases linearly with the state size, but this doesn’t affect pipeline performance since it’s done asynchronously.</p>
<p>The alignment time depends on the parallelism because the more tasks in need of alignment, the higher the expected amount of wait time.</p>
<h2 id="conclusion">Conclusion</h2>
<p>State management is hard. State management in an application that needs to run continuously in a distributed manner is even harder.</p>
<p>Snapshots and rollbacks allow us to abstract some of this complexity by turning the continuous processing into a discrete one. We just need to function properly until the next snapshot is taken.</p>
<p>The paper is well written but the space limitation makes it infeasible to provide details which makes some topics harder to understand such as incremental snapshots and and rescaling, but luckily they’ve been discussed in more detail in blog posts [5. 6].</p>
<h2 id="references">References</h2>
<ul>
<li>[<a href="http://www.vldb.org/pvldb/vol10/p1718-carbone.pdf">1</a>] State Management In Apache Flink: Consistent Stateful Distributed Stream Processing - Carbone et al.</li>
<li>[<a href="https://www.kuniga.me/blog/2022/05/18/flumejava.html">2</a>] NP-Incompleteness: Paper Reading - FlumeJava</li>
<li>[<a href="https://www.kuniga.me/blog/2015/08/07/notes-on-zookeeper.html">3</a>] NP-Incompleteness: Notes on Zookeeper</li>
<li>[<a href="https://www.kuniga.me/blog/2022/07/26/review-streaming-systems.html">4</a>] NP-Incompleteness: Review: Streaming Systems</li>
<li>[<a href="https://flink.apache.org/features/2017/07/04/flink-rescalable-state.html">5</a>] A Deep Dive into Rescalable State in Apache Flink</li>
<li>[<a href="https://flink.apache.org/features/2018/01/30/incremental-checkpointing.html">6</a>] Managing Large State in Apache Flink: An Intro to Incremental Checkpointing</li>
<li>[<a href="https://www.kuniga.me/blog/2018/07/20/log-structured-merge-trees.html">7</a>] NP-Incompleteness: Log Structured Merge Trees</li>
</ul>Guilherme KunigamiIn this post we’ll discuss the paper State Management In Apache Flink: Consistent Stateful Distributed Stream Processing by Carbone et al. [1]. This paper goes over the details of how state is implemented in Flink, an open-source distributed stream processing system. Great emphasis is given on fault-tolerance and reconfiguration mechanisms using snapshots.Buffon’s Needle2022-08-16T00:00:00+00:002022-08-16T00:00:00+00:00https://www.kuniga.me/blog/2022/08/16/buffons-needle<!-- This needs to be define as included html because variables are not inherited by Jekyll pages -->
<p>In his book <em>How Not To Be Wrong</em>, Jordan Ellenberg discusses the Buffon’s Needle as follows. Suppose we’re given a needle of length $\ell$ and we drop it in a hardwood floor, consisting of vertical slats of width $\ell$. What is the chance it will cross slats?</p>
<figure class="center_children">
<img src="https://www.kuniga.me/resources/blog/2022-08-16-buffons-needle/simulation.png" alt="See caption." />
<figcaption>Figure 1: simulation of 1,000 needles dropped at random on the floor. The ones crossing slats are colored red. (source: <a href="https://observablehq.com/d/be84f22795d14b41">Observable</a>)</figcaption>
</figure>
<p>In this post we’ll explore a solution to this problem and provide an algorithm to simulate it.</p>
<!--more-->
<h2 id="mathematical-formulation">Mathematical Formulation</h2>
<p>To formalize the problem, we’ll assume the needle is a line segment of length $\ell$ and the hardwood floor consists of an infinite plane with parallel lines $\ell$ units of distance apart. We want to compute the probability $p$ of the line segment intersecting any line.</p>
<p>An observation is that the segment can cross at most one line. In theory there’s a possibility the needle falls perfectly horizontal but the probability of that happening is infinitesimal.</p>
<h2 id="simulation">Simulation</h2>
<p>We can simulate a random segment drop as follows: we choose a point in the plane at random and let that be the center $c = (c_x, c_y)$ of a unit circle, where $0 \le c_x \le 1$ (the value of $c_y$ does not matter). We then pick a random point in the circumference $b = (b_x, b_y)$ of such circle.</p>
<p>To determine whether it crosses a line we just need to check whether the endpoints of the segments are on different slats. Note that only the $x$ values matter for this and we just need to verify whether $\lfloor \frac{b_x}{\ell} \rfloor = \lfloor \frac{c_x}{\ell} \rfloor$.</p>
<p>We just saw in the post <a href="https://www.kuniga.me/blog/2022/08/01/random-points-in-circumference.html">Random Points in Circumference</a> how to compute points in the circumference without explicit use of angles. Given random variables $X, Y$ uniformily distributed on $[-1, 1]$ and satisfying $X^2 + Y^2 \le 1$, a point in the circumference is given by:</p>
\[X' = \frac{X^2 - Y^2}{X^2 + Y^2}\\
\\
Y' = \frac{2XY}{X^2 + Y^2}\]
<p>We then get $c_x$ and $c_y$ from uniformily distributed random variables $C_X$ and $C_Y$. $b_x$ can be obtained via:</p>
\[\frac{X^2 - Y^2}{X^2 + Y^2} + C_X\]
<p>And $b_y$ via:</p>
\[\frac{2XY}{X^2 + Y^2} + C_Y\]
<h3 id="computing-the-probability">Computing the Probability</h3>
<p>We can compute the probability of the segment crossing a line geometrically. Recall from <a href="https://www.kuniga.me/blog/2022/08/01/random-points-in-circumference.html">Random Points in Circumference</a> that we compute the propability of random variables $X, Y$ uniformily distributed on $[-1, 1]$ satisfying $X^2 + Y^2 \le 1$ as $\pi/4$ by the ratio of the areas of a unit circle over that of a square with size 2.</p>
<p>First we observe that we don’t need to consider $c_y$ in this analysis and we can assume $0 \le c_x \le 1$ by noting it doesn’t matter in which slat it lands on but only the relative position from the left side of the slat. So we will use random variables $X, Y$ uniformily distributed on $[-1, 1]$ and $C_X$ uniformily distributed on $[0, 1]$.</p>
<p>Since we’re using 3 random variables, we need to do a ratio of volumes to compute the probability. The denominator of the ratio is the volume of the samples under consideration, more precisely $X^2 + Y^2 \le 1$ and $0 \le C_X \le 1$, which is a cilinder with volume $\pi$.</p>
<p>The numerator of the ratio is more complicated. Let’s look at one slice of the cilinder for $C_X = z$, which is a circle. Consider a point $(x, y)$ in this circle, which can be writen in polar coordinates $(r, \theta)$. By definition $b_x = \cos \theta + z$.</p>
<p>Let’s consider two cases: the segment $\overline{cb}$ is leaning forward, in which case $0 \le \theta \le \frac{\pi}{2}$ or $\frac{3\pi}{2} \le \theta \le 2\pi$ or leaning backward, where $\frac{\pi}{2} \le \theta \le \frac{3\pi}{2}$. Now we note there’s symmetry on the $y$-axis so it suffices to look at angles $0 \le \theta \le \pi$.</p>
<p>If it’s leaning forward, let $\theta_f$ be the angle such that $\cos \theta_f + z = 1$. If we lean the segment more forward (i.e. towards a horizontal position), moving the angle $\theta_f$ towards 0, the crossing will still happen.</p>
<p>So the samples in which a crossing happens when leaning forwars is a sector from angles $0$ to $\theta_f = \cos^{-1} (1 - z)$. The area of a sector with angle $\alpha$ is $\frac{\alpha}{2}$, so the area of the sector is</p>
\[\frac{\cos^{-1} (1 - z)}{2}\]
<p>but accounting for the $y$-axis symmetry, we multiply it by 2:</p>
\[(1) \quad \cos^{-1} (1 - z)\]
<p>A similar argument gives us the area of the sector when leaning backwards:</p>
\[(2) \quad \cos^{-1} (z)\]
<p>The sector corresponding to samples “leaning forward” that crosses the line is shown in green in <em>Figure 2</em>, while the one corresponding to “leaning backward” is shown in red.</p>
<figure class="center_children">
<img src="https://www.kuniga.me/resources/blog/2022-08-16-buffons-needle/circle.png" alt="See caption." />
<figcaption>Figure 1: Cross-section of the cilinder for a fixed z. The colored areas represent samples in which a line crossing happens.</figcaption>
</figure>
<p>If we wish to compute the volume of the points corresponding to a crossing, we thus need to integrate the area of these sectors over $z$. We note that (1) and (2) are the same if we flip the direction of integration from $z = 0 \rightarrow 1$ to $z = 1 \rightarrow 0$, so for the purpose of volume calculation we can do:</p>
\[2 \int_{z=0}^{1} \cos^{-1} (z) dz\]
<p>Which is</p>
\[2 \left( z \cos^{-1}(z) - \sqrt{1 - z^2} + C \rvert_{z = 0}^{1} \right) = 2\]
<p>So the volume of the sectors is 2 and thus the probability of a segment crossing a line is $\frac{2}{\pi}$.</p>
<h2 id="buffons-noodle">Buffon’s Noodle</h2>
<p>There’s a very elegant solution that doesn’t require solving an explicit integral and it’s attributed to Joseph-Émile Barbier. It relies on the linearity of expectation, that is $E[X_1 + X_2] = E[X_1] + E[X_2]$, which holds even if $X_1$ and $X_2$ are not independent.</p>
<p>Let’s generalize a bit and suppose we are want to compute the expected number $E_S$ of crossings given a segment of length $\ell$, not necessarily of length 1 as above.</p>
<p>Suppose we split the segment into two equal parts. Since they’re identical, independently they have the same probability of crossing a line and hence the same expected value, say $e’$.</p>
<p>We can thus write the expected value $E_S$ as $E_S = 2 e’$. If we keep dividing into ever smaller parts, say $N$ segments of length $\delta_\ell$ (where $\ell = \delta_\ell N$) and expected value $\epsilon$, then:</p>
\[E_S = \sum_{i = 1}^{N} \epsilon = N \epsilon = \ell \frac{\epsilon}{\delta_\ell}\]
<p>Let $x$ be a point in the segment and $f(x)$ some scalar function of $x$. We can pretend $\epsilon$ is the difference between $f$ evaluated at $x$ and a point in the neighborhood $x + \delta_\ell$, that is: $\epsilon = f(x + \delta_\ell) - f(x)$. If we take the limit:</p>
\[\lim_{\delta_\ell \rightarrow 0} \frac{f(x + \delta_\ell) - f(x)}{\delta_\ell}\]
<p>We end up with the derivative $\frac{df(x)}{d\ell}$ and</p>
\[E_S = \ell \frac{df(x)}{d\ell}\]
<p>Since for sub-segments of same lenght we have the same expected value, the derivative is the same no matter $x$, which formalizes our intuition that $E_S$ is proportional to $\ell$:</p>
\[E_S = k \ell\]
<p>So we just need to find the constant $k$.</p>
<p>The big leap is that this process works for any differentiable curve, not necessarily a straight line and that’s why this problem is called Buffon’s <em>noodle</em>. One of such curves is the circle with diameter 1. Its circumference is $\pi$, so its expected number of crossing, say $E_C$, is given by $E_C = k \pi$. However, such a circle always crosses lines exactly twice so $E_C = 2$.</p>
<p>This allows us computing $k = \frac{2}{\pi}$. Since in our original problem $\ell = 1$, $E_S$ is also $\frac{2}{\pi}$ as we demonstated previously.</p>
<h2 id="conclusion">Conclusion</h2>
<p>In this post we assumed the needle length $\ell$ and the slat width $w$ are the same. However, much of the same arguments apply to the case where $\ell \le w$, but not if $\ell \gt w$. The key difference is that when $\ell \le w$, a segment can only cross slats at most once, so the expectated value is equal to the probability.</p>
<p>For $\ell \gt w$ we can still determine the <em>expected value</em> which is what the Buffon’s Noodle problem solves.</p>
<p>The Buffon’s needle problem led me to ponder about generating points on the circumference and write a preliminary post [4].</p>
<p>It also found it nice that the recent insights led me to come up with a proof the probability of the Buffon’s needle via volume ratios which is also described in <em>Using elementary calculus</em> in Wikipedia [2].</p>
<h2 id="references">References</h2>
<ul>
<li>[1] How Not To Be Wrong, Jordan Ellenberg</li>
<li>[<a href="https://en.wikipedia.org/wiki/Buffon%27s_needle_problem">2</a>] Wikipedia: Buffon’s needle problem</li>
<li>[<a href="https://en.wikipedia.org/wiki/Buffon%27s_noodle">3</a>] Wikipedia: Buffon’s noodle problem</li>
<li>[<a href="https://www.kuniga.me/blog/2022/08/01/random-points-in-circumference.html">4</a>] NP-Incompleteness: Random Points in Circumference</li>
</ul>Guilherme KunigamiIn his book How Not To Be Wrong, Jordan Ellenberg discusses the Buffon’s Needle as follows. Suppose we’re given a needle of length $\ell$ and we drop it in a hardwood floor, consisting of vertical slats of width $\ell$. What is the chance it will cross slats? Figure 1: simulation of 1,000 needles dropped at random on the floor. The ones crossing slats are colored red. (source: Observable) In this post we’ll explore a solution to this problem and provide an algorithm to simulate it.Random Points in Circumference2022-08-01T00:00:00+00:002022-08-01T00:00:00+00:00https://www.kuniga.me/blog/2022/08/01/random-points-in-circumference<!-- This needs to be define as included html because variables are not inherited by Jekyll pages -->
<p>I recently wanted to generate points in the circumference of a unit circle. After some digging, I found a very elegant algorithm from John von Neumann and then a related method for generating normally distributed samples.</p>
<p>In this post we’ll describe these methods starting with a somewhat related problem which provides some techniques utilized by the other methods.</p>
<!--more-->
<h2 id="pi-from-random-number-generator">$\pi$ From Random Number Generator</h2>
<p>First, let’s explore a different problem: how can we estimate $\pi$ using only a random number generator? The idea is to define a point with independent random variables as coordinates $(X, Y)$, uniformed sampled from $[-1, 1]$.</p>
<p>Suppose we sampled values $x$ and $y$. Then we compute the fraction of samples such that $x^2 + y^2 \le 1$. We claim that this ratio will be approximately $\frac{\pi}{4}$!</p>
<p>The idea is that when $x^2 + y^2 \le 1$ they’re points inside the unit circle (centered in 0). Whereas $(x, y)$ are coordinates of a square of side 2. Since the samples are uniformly distributed, we’d expect that the proportion of points inside the circle over the total will be the ratio of the area of the circle, $\pi r^2$ over that of the square, $\ell^2$. Since $r = 1$ and $\ell = 2$, we get $\frac{\pi}{4}$, which is approximately <code class="language-plaintext highlighter-rouge">0.78</code>.</p>
<p>To gain better intuition, we can plot the points. We color the ones satisfying $x^2 + y^2 \le 1$ red, otherwise blue. <em>Figure 1</em> shows an example with 10,000 samples.</p>
<figure class="center_children">
<img src="https://www.kuniga.me/resources/blog/2022-08-01-random-points-in-circumference/n_10k.png" alt="See caption." />
<figcaption>Figure 1: plot of 10,000 points with random coordinates each sampled from [-1, 1]. Points inside the unit circle are colored red (source: <a href="https://observablehq.com/d/56c5ead2ae49c97b">Observable</a>)</figcaption>
</figure>
<p>I learned about this trick many years ago from <a href="https://www.ricbit.com/2008/06/um-cientista-em-minha-vida.html">Brain Dump</a> (in Portuguese).</p>
<h2 id="random-points-in-circumference">Random Points in Circumference</h2>
<p>Now let’s focus on our main problem. One naive idea to solve it using the method we just saw above is to generate sample points and discard all those <em>not</em> satisfying $x^2 + y^2 = 1$. This would be incredibly inefficient and in theory the probability of a continuous random variable is actually 0.</p>
<p>Another approach is to generate a random number $\Theta$ in the interval $[0, 2 \pi[$. We can then compute $x = \cos \Theta$ and $y = \sin \Theta$ which will be in the circumference. It’s possible however to avoid computing $\sin$ and $\cos$ altogether, with a method proposed by von Neumann [1].</p>
<p>First we generate a random point inside the unit circle. We just saw how to do this above: define random variables $X$ and $Y$, uniformly sampled from $[-1, 1]$. Discard sample if $X^2 + Y^2 > 1$.</p>
<p>Recall that we can represent a point in cartesian coordinates $(X, Y)$ with polar coordinates $(R, \Theta)$, where $X = R \cos \Theta$ and $Y = R \sin \Theta$ and $R = \sqrt{X^2 + Y^2}$.</p>
<p>If $X$ and $Y$ are random points in the unit circle, $\Theta$ is uniformly distributed over the interval $[0, 2 \pi[$. Intuitively, there’s no reason to assume a given angle is more likely if we pick a point at random inside a circle. Notice this is not true if we sample inside a square!</p>
<p>If we wish to generate coordinates $(X’, Y’)$ on the circumference of the unit circle, we can compute $X’ = \cos \Theta$, which can be obtained as</p>
\[X' = \cos \Theta = \frac{X}{R} = \frac{X}{\sqrt{X^2 + Y^2}}\]
<p>Similarly for $Y’ = \sin \Theta$:</p>
\[Y' = \sin \Theta = \frac{Y}{R} = \frac{Y}{\sqrt{X^2 + Y^2}}\]
<p>Computing the square root is not ideal, so the trick is to use $2\Theta$ which is uniformly sampled between $[0, 4 \pi[$, but since trigonometric functions have period $2\pi$ (e.g. $\sin(x) = \sin(x + 2 \pi)$ and $\cos(x) = \cos(x + 2 \pi)$), effectively $2\Theta$ has the same distribution as $\Theta$!</p>
<p>Assuming $X’ = \cos(2\Theta)$ and using the identity $\cos(2\Theta) = \cos \Theta^2 - \sin \Theta^2$:</p>
\[X' = \frac{X^2 - Y^2}{R^2} = \frac{X^2 - Y^2}{X^2 + Y^2}\]
<p>Similarly $Y’ = \sin(2\Theta) = 2 \sin(x) \cos(x)$:</p>
\[Y' = \frac{2XY}{R^2} = \frac{2XY}{X^2 + Y^2}\]
<p>Which gets rid of the square root! We have to be careful with small values of $X^2 + Y^2$ in the denominator, but we can discard points when $X^2 + Y^2 \le \epsilon$ without affecting the distribution of $\Theta$.</p>
<h3 id="experimentation">Experimentation</h3>
<p>We can generate multiple points using this method and plot them in the plane to obtain <em>Figure 2</em>.</p>
<figure class="center_children">
<img src="https://www.kuniga.me/resources/blog/2022-08-01-random-points-in-circumference/circ.png" alt="See caption." />
<figcaption>Figure 2: Plotting 400 random points in the circumference of a unit circle, point at the origin added for reference (source: <a href="https://observablehq.com/d/56c5ead2ae49c97b">Observable</a>)</figcaption>
</figure>
<h3 id="complexity">Complexity</h3>
<p>The probability of a point falling inside the circle is $\frac{\pi}{4}$, so the number of tries until we find a valid point can be modeled as a geometric distribution with success rate $p = \frac{\pi}{4}$, with expected value given by $\frac{1}{p} = \frac{4}{\pi} = 1.27$, which is pretty efficient.</p>
<p>Amazingly, it’s possible to use this technique to generate points with a normal distribution, as we’ll see next!</p>
<h2 id="generating-normal-distribution">Generating Normal Distribution</h2>
<p>If a random variable $X$ has a normal distribution ($\sigma = 1$, $\mu = 0$), its cumulative probability distribution (CDF) can be given by:</p>
\[(1) \quad P(X \le x) = F_X(x) = \sqrt{\frac{1}{2\pi}} \int_{-\infty}^{x} e^{-u^2/2} du\]
<p>Now, consider a random point $(X, Y)$ inside the unit circle. We define</p>
\[(2) \quad N_1 = X \sqrt{\frac{-2 \ln S}{S}}\]
\[(3) \quad N_2 = Y \sqrt{\frac{-2 \ln S}{S}}\]
<p>Where $S = X^2 + Y^2$. We’ll show that $N_1$ and $N_2$ are normally distributed independent variables.</p>
<p>As before, we represent $(X, Y)$ in polar coordinates $(R, \Theta)$, where $X = R \cos \Theta$ and $Y = R \sin \Theta$, $R = \sqrt{X^2 + Y^2}$ and thus $S = R^2$.</p>
<p>We can write (2) as</p>
\[N_1 = \frac{X}{R} \sqrt{-2 \ln S} = \cos \Theta \sqrt{-2 \ln S}\]
<p>and (3) as:</p>
\[N_2 = \frac{Y}{R} \sqrt{-2 \ln S} = \sin \Theta \sqrt{-2 \ln S}\]
<p>The pair $(N_1, N_2)$ forms a cartesian coordinate so they too can be written in polar form: $N_1 = R’ \cos\Theta’$ and $N_2 = R’ \sin\Theta’$, for $R’ > 0$ and $\Theta’ \in [0, 2 \pi[$.</p>
<p>Every cartesian coordinate can be uniquely represented in a polar form as $\alpha \sin(\phi)$ and $\alpha \cos(\phi)$, if $\alpha > 0$ and $\phi \in [0, 2 \pi[$. So if we ignore samples with $S = 0$ (making $\sqrt{-2 \ln S} > 0$) and we can normalize $\Theta$ to be $\in [0, 2 \pi[$, this allows us to conclude that $R’ = \sqrt{-2 \ln S}$ and $\Theta’ = \Theta$.</p>
<p>Let’s compute the CDF for $R’$, i.e. the probability that $R’ \le r$. Since $R’ = \sqrt{-2 \ln S}$, we need to find $-2 \ln S \le r^2$ or $\ln S \ge -r^2/2$. Applying $\exp()$ on both sides yields: $S \ge e^{-r^2/2}$.</p>
<p>We claim that $S$ is uniformly distributed in the range $0 \le x \le 1$ (see <em>Appendix</em> for proof). Thus $P(S \le e^{-r^2/2}) = e^{-r^2/2}$ and since $P(S \ge e^{-r^2/2}) = 1 - P(S \le e^{-r^2/2})$ we have $P(R’ \le r) = 1 - e^{-r^2/2}$.</p>
<p>The probability $R’$ is in the interval $[r, r + dr]$ can be obtained by the derivative $d\frac{P(R’ \le r)}{dr}$, which is $re^{-r^2/2} dr$.</p>
<p>$\Theta’$ is uniformly distributed in the range $[0, 2 \pi[$, so its CDF is $F_{\Theta’}(\theta) = \frac{\theta}{2 \pi}$ and the probability $\Theta’$ is in the interval $[\theta, \theta + d\theta]$ is $d\frac{F_{\Theta’}(\theta)}{d\theta} = (1/2\pi)d\theta$.</p>
<p>Because $R’$ and $\Theta’$ are independent variables, we can compute their joint probability more easily, since:</p>
\[P(R' = r, \Theta' = \theta) = P(R' = r) P(\Theta' = \theta) = re^{-r^2/2} dr (1/2\pi)d\theta\]
<p>Thus, to compute the CDF over an interval:</p>
\[P(R' \le r', \Theta' \le \theta') = \int_{0}^{r'} re^{-r^2/2} dr \int_{0}^{\theta'} (1/2\pi) d\theta = \frac{1}{2\pi} \int_{0}^{r'} \int_{0}^{\theta'} re^{-r^2/2} dr d\theta\]
<p>A further generalization is to combine both integrals into one but sum over a pair $(r, \theta)$ based on a predicate:</p>
\[\frac{1}{2\pi} \int_{0}^{r'} \int_{0}^{\theta'} re^{-r^2/2} dr d\theta = \frac{1}{2\pi} \int_{(r, \theta): r \le r', \theta \le \theta'} re^{-r^2/2} dr d\theta\]
<p>This predicate-based integral over a pair form is convenient for what we’ll do next. We now want to compute the joint probability $P(N_1 \le x_1, N_2 \le x_2)$ which can be expressed in polar form:</p>
\[P(N_1 \le x_1, N_2 \le x_2) = P(R' \cos\Theta' \le x_1, R' \cos\Theta' \le x_2)\]
<p>The predicate-based integral over $(r, \theta)$ is useful for this because while $R’$ and $\Theta’$ are independent, the conditions $R’ \cos\Theta’ \le x_1$ and $R’ \cos\Theta’ \le x_2$ are binding them together, thus we can do:</p>
\[P(R' \cos\Theta' \le x_1, R' \cos\Theta' \le x_2) = \frac{1}{2\pi} \int_{(r, \theta): r \cos \Theta \le x_1, r \sin \Theta \le x_2} re^{-r^2/2} dr d\theta\]
<p>We can switch from polar coordinates to cartesian:</p>
\[= \frac{1}{2\pi} \int_{(x, y): x \le x_1, y \le x_2} re^{-(x^2 + y^2)/2} dx dy\]
<p>In this form we can switch back to the interval-based integrals:</p>
\[= \frac{1}{2\pi} \int_{-\infty}^{x_1} \int_{-\infty}^{x_2} re^{-(x^2 + y^2)/2} dx dy\]
<p>Which can be split into a product of integrals:</p>
\[= \frac{1}{2\pi} \left(\int_{-\infty}^{x_1} re^{-x^2/2}dx\right) \left(\int_{-\infty}^{x_2} re^{-y^2/2}dy\right)\]
<p>Moving the $1 / 2\pi$ in equal parts to each factor yields:</p>
\[P(N_1 \le x_1, N_2 \le x_2) = \left(\sqrt{\frac{1}{2\pi}} \int_{-\infty}^{x_1} re^{-x^2/2}dx\right) \left(\sqrt{\frac{1}{2\pi}} \int_{-\infty}^{x_2} re^{-y^2/2}dy\right)\]
<p>Each factor now represents the CDF for a normal distribution as in (1)! And since the factors are separable their distributions are independent:</p>
\[P(N_1 \le x_1, N_2 \le x_2) = P(N_1 \le x_1) P(N_2 \le x_2)\]
<p>So if we want to generate a normally distributed random value we can generate a sample $(x, y)$ in the unit circle and compute a number with either (2) or (3).</p>
<p>This method was devided by Box, Marsaglia and Muller [2] and is known as the <strong>Polar Method</strong>.</p>
<h3 id="experimentation-1">Experimentation</h3>
<p>We can generate multiple points using (2) or (3) and build a histogram as in <em>Figure 3</em>. Note the bell-shaped curve which is what we’d expect from sampling from a normal distribution.</p>
<figure class="center_children">
<img src="https://www.kuniga.me/resources/blog/2022-08-01-random-points-in-circumference/histogram.png" alt="See caption." />
<figcaption>Figure 3: Points sampled using (2) and plotted in a histogram. The bell shape is indicative of a normal distribution (source: <a href="https://observablehq.com/d/56c5ead2ae49c97b">Observable</a>)</figcaption>
</figure>
<p>We can also generate points $(x, y)$ with $x$ sampled from (2) and $y$ from (3) and plot them in a scatterplot to visualize their correlation as in <em>Figure 4</em>. Notice that the circle shape suggests the distributions used to generate the samples are not correlated.</p>
<figure class="center_children">
<img src="https://www.kuniga.me/resources/blog/2022-08-01-random-points-in-circumference/scatter.png" alt="See caption." />
<figcaption>Figure 4: Samples generated with (2) plotted against samples generated with (3). The circular shape is indicative of lack of correlation between the distributions (2) andd (3). (source: <a href="https://observablehq.com/d/56c5ead2ae49c97b">Observable</a>)</figcaption>
</figure>
<h2 id="conclusion">Conclusion</h2>
<p>The process of investigating the question of generating points on the circumference let me to very interesting findings. I first found an article at <a href="https://mathworld.wolfram.com/CirclePointPicking.html">MathWorld</a> which provides the formula for the generator but not the theory behind it. It refers to a paper from von Neumman which luckily was <a href="https://mcnp.lanl.gov/pdf_files/nbs_vonneumann.pdf">available online</a> but somewhat hard to understand.</p>
<p>In another search attempt I ran into a Q&A (I forgot the link) where the OP mentioned Knuth’s <a href="https://en.wikipedia.org/wiki/The_Art_of_Computer_Programming">TAOCP</a> and looking it up led me to find the normally distribution sample via the polar method, which in turn made it easier to understand von Neumann’s method.</p>
<h2 id="appendix">Appendix</h2>
<p><strong>Lemma 1.</strong> $S$ is uniformly distributed in the range $0 \le r \le 1$.</p>
<p>First we compute the CDF $P(R \le r)$ for $r \in [0, 1]$. The odds we pick a point which falls with a circle of radius $r$ is the ratio of the area of such circle over the area of the unit circle, that is,</p>
\[(4) \quad P(R \le r) = \frac{\pi r^2}{\pi} = r^2\]
<p>We now compute $P(S \le r)$ for $r \in [0, 1]$. Since $S = R^2$ we have $P(S \le r) = P(R^2 \le r) = P(-\sqrt{r} \le R \le \sqrt{r})$.</p>
<p>Since $R$ represents the radius, $R \ge 0$ and thus $P(R \le 0) = 0$, so $P(R \le -\sqrt{r}) = 0$ and $P(-\sqrt{r} \le R \le \sqrt{r}) = P(R \le \sqrt{r})$. From (4) we get</p>
\[P(S \le r) = P(R \le \sqrt{r}) = r\]
<p>Which is the CDF for uniform distribution. <em>QED</em>.</p>
<h2 id="references">References</h2>
<ul>
<li>[1] Various Techniques Used in Connection With Random Digits, J. von Neumann.</li>
<li>[2] The Art of Computer Programming: Volume 2, 3.4.1, D. Knuth. (p. 122)</li>
</ul>Guilherme KunigamiI recently wanted to generate points in the circumference of a unit circle. After some digging, I found a very elegant algorithm from John von Neumann and then a related method for generating normally distributed samples. In this post we’ll describe these methods starting with a somewhat related problem which provides some techniques utilized by the other methods.Review: Streaming Systems2022-07-26T00:00:00+00:002022-07-26T00:00:00+00:00https://www.kuniga.me/blog/2022/07/26/review-streaming-systems<!-- This needs to be define as included html because variables are not inherited by Jekyll pages -->
<figure class="image_float_left">
<img src="https://www.kuniga.me/resources/blog/2022-07-26-review-streaming-systems/strsys.jpeg" alt="Streaming Systems book cover" />
</figure>
<p>In this post we will review the book <em>Streaming Systems</em> by Tyler Akidau, Slava Chernyak and Reuven Lax [1]. The book focuses on distributed streaming processing systems, reflecting the authors’ experience of building DataFlow at Google. Akidau is also one of the founders of <a href="https://beam.apache.org/">Apache Beam</a>.</p>
<p>We’ll go over some detail on each chapter. The notes might be missing context because I took them while I had the book in front of me, but I tried to fill some of it in as I noticed them. If you’re just interested in a summary, jump to the <em>Conclusion</em>.</p>
<!--more-->
<h2 id="organization-of-the-book">Organization of the book</h2>
<p>The book is divided in 2 parts for a total of 10 chapters. In Part I, the book covers the Beam model, which can be thought of a spec for how to implement stream processing systems and specifies high-level concepts such as watermark, windowing and exactly-once semantics. In Part II the authors provide the streams and tables view, which brings the concepts of batch and streaming closer together.</p>
<p>Most of the book is written by Akidau, with specific chapters written by Chernyak (Chapter 3) and Lax (Chapter 5).</p>
<h2 id="selected-notes">Selected Notes</h2>
<p>I’m going to over each chapter providing a brief summary interspersed with my random notes.</p>
<h3 id="chapter-1---streaming-101">Chapter 1 - Streaming 101</h3>
<p>In this chapter the author provides motivations for stream processing, introduces terminology and does some initial comparison with batch systems. This chapter is <a href="https://www.oreilly.com/radar/the-world-beyond-batch-streaming-101/">freely available online</a>.</p>
<p><strong>Cardinality</strong>: <em>Bounded</em> or <em>Unbounded data</em>. Bounded data have a determined beginning and end (i.e. boundary). <strong>Constitution</strong>: <em>Table</em> or <em>Stream</em>. Tables are a snapshot of the data at a specific point in time. Streams are element-by-element view of the data over time. Chapter 6 dives deeper into these. Constitution and cardinality are independent concepts, so we can have bounded stream and unbounded tables, but stream processing is mostly associated with unbounded streams. An <strong>event</strong> in a stream is the analogous to a row in a table.</p>
<p><strong>Event vs. Processing Time.</strong> Event time is the timestamp corresponding to when the event was created (e.g. logged), while processing time is the timestamp when the event was processed by the system. <em>Out-of-order data</em>: one challenge with streaming is that events can come out-of-order (with respect to event time).</p>
<p>Completeness: with unbounded data, events can be arbitrarily late (e.g. event logged locally, offline, will only be processed when user goes back online). This makes theoretically impossible to know when we are done processing data.</p>
<h3 id="chapter-2---the-what-where-when-and-how-of-data-processing">Chapter 2 - The What, Where, When and How of Data Processing</h3>
<p>This chapter introduce specific terminology and features from the Beam model including transformations, windows, triggers and accumulation. The Beam model concept is discussed in Chapter 10. This chapter is <a href="https://www.oreilly.com/radar/the-world-beyond-batch-streaming-102">freely available online</a>.</p>
<p>Transformations can be element-wise like a mapper, or change the data cardinality via aggregations such as group-by or widowing.</p>
<p>Watermarks is a timestamp aiming to provide a boundary to unbounded data. It can be read as “I’ve already processed all events with timestamp less than the watermark”. This is important when events can be arbitrarily late because it provides a different criteria for moving on instead of waiting for all events to arrive. More details are covered in Chapter 3. <em>Figure 1</em> shows the watermark lines as dashed/dotted line.</p>
<figure class="center_children">
<img src="https://www.kuniga.me/resources/blog/2022-07-26-review-streaming-systems/time_graph.png" alt="See caption" />
<figcaption>Figure 1: Events plotted in a graph of time vs processing time. It also includes the line depicting the value a the heuristic watermark. Notice how the watermark moved on without processing event with value 9. (source: [2])</figcaption>
</figure>
<p><strong>Triggers</strong> determine when to provide partial results when doing aggregations. For example, if we’re counting how many events match a given condition, we can setup a trigger that fires every 5 minutes, sending the current count downstream. Triggers can also be configured to fire when watermarks are updated.</p>
<p><strong>Late data.</strong> Because we can move on without waiting for all data to arrive, it’s possible that some data arrives late. A late data is any event with timestamp less than the watermark.</p>
<p>Accumulation determines how to proceed with partial updates. Suppose we provide partial sums every 5 minutes. What should we do, while doing some aggregation, when the trigger fires a second time. There are a few options:</p>
<ul>
<li>Send the current partial sum</li>
<li>Send a retraction of the previous partial sum and then a new one</li>
<li>Send the delta</li>
</ul>
<p>The choice of accumulation depends on the downstream consumer. If it’s writing to a key-value store, sending the current partial sum works well, but if it’s doing further aggregation it needs the retraction or the delta.</p>
<p>I found the “What, Where, When and How” view/analogy very confusing. Perhaps I didn’t internalize the right mental model, but I found it unnecessary.</p>
<h3 id="chapter-3---watermarks">Chapter 3 - Watermarks</h3>
<p>This chapter dives deeper into watermarks.</p>
<p>Perfect watermark is one in which we have a guarantee that no event will be processed late. More precisely, let $t_{wm}$ be the current watermark. Perfect watermark says that for every event $e$ processed in the future, their timestamp $t_{e}$ will be greater than $t_{wm}$.</p>
<p>Perfect watermarks are impossible to implement in the general case because we cannot tell whether some event is stuck on a offline phone for example. Even if we could, waiting for a extremely late event could hold up the pipeline in case the trigger is configured to only fire when a watermark updates.</p>
<p>Heuristic watermarks allows trading off completeness (allowing some events to arrive late) for latency (reducing time waiting for laggards).</p>
<p>In multi-stage pipelines we need to keep track of watermarks at each stage because events can be processed in different orders (e.g. parallel mapper) or can be delayed due to system issues (crash, overload). In addition, some operations such as windowing naturally add a watermark delay. So for example if stage 1 performs a tumbling window of size 5 minutes, the next stage’s watermark is expected to be 5 minutes behind.</p>
<p>It’s possible to keep watermarks for processing time in addition to event time ones. This can be useful to detect if any stage is holding up the pipeline.</p>
<h3 id="chapter-4---advanced-time-windowing">Chapter 4 - Advanced Time Windowing</h3>
<p>This chapter dives deeper into windows.</p>
<p>Every aggregation in streaming processing is associated with a time window. For example, it doesn’t make sense to count the number of events in an unbounded stream without specifying a time range (i.e window).</p>
<p>Windows can be based on processing time, such as those associated with time-based triggers (e.g. fire every 5 minutes). They can also be based on event time, by leveraging watermarks. One example of event-time windowing is the <strong>tumbling window</strong>. <em>Figure 2</em> shows an example.</p>
<figure class="center_children">
<img src="https://www.kuniga.me/resources/blog/2022-07-26-review-streaming-systems/windows.png" alt="See caption" />
<figcaption>Figure 2: Window aggregations. Each column represents a window of 2 minutes length. The number on the column is the aggregated value output on trigger. Notice the first window triggers twice, the second time for the late data 9 (source: [2])</figcaption>
</figure>
<p>The tumbling window partitions the event time axis into fixed intervals. Events with timestamps inside the interval are considered part of the window. It’s possible to have multiple parallel windows, for example when we perform an aggregation following a group by key. In this case the tumbling windows can be aligned across all keys or they can be shifted to avoid burst updates when triggers fire.</p>
<p>Another type of window is data-dependent ones, for example <strong>session windows</strong>. Session windows are used to model user sessions. Events with the same key (e.g. user id) that happen within a duration from each other (say 5 minutes) are to be grouped in the same window.</p>
<p>Since the window boundaries are not known in advance, they must be merged on the fly. When windows are merged we likely need to perform retractions. Suppose we have a trigger that fires on every event processed. Suppose we currently have 2 session windows with range [10:00 - 10:05] and [10:10 - 10:30] and that the threshold duration for merging is 5 minutes.</p>
<p>Now a new event comes with event time 10:06. It will cause the windows to be merged into one with range [10:00 - 10:30]. However, the previous two windows had triggered outputs before so they need to retract those.</p>
<h3 id="chapter-5---exactly-once-and-side-effects">Chapter 5 - Exactly-Once And Side Effects</h3>
<p><strong>Exactly-once</strong> semantics is a guarantee that the consumer of the stream will see each event exactly once. This is in contrast with <strong>at-least-once</strong> semantics where each event is guaranteed to be included in the output but they could be included twice. And <strong>at-most-once</strong> where each event is guaranteed to not be duplicated but they can be dropped.</p>
<p>Exactly-once semantics is non-trivial to achieve in fault-tolerant systems because it has to retry computation while guaranteeing events don’t get processed twice.</p>
<p>Even if the system provides exactly-once guarantee, it doesn’t guarantee that the user function will be called exactly once (the guarantee is over the output), so if it’s non-deterministic or has side-effect, it can cause problems.</p>
<p>To achieve exactly-once guarantee the input of the system also needs to provide some guarantees. Either it has to have exactly-once guarantees itself or it must be able to identify events with a unique ID, so if the system can use it to detect duplicates itself.</p>
<h3 id="chapter-6---streams-and-tables">Chapter 6 - Streams and Tables</h3>
<p>In this chapter Akidau provides the theory of streams and tables. The idea is that both batch and stream processing use both streams and tables internally but in different ways.</p>
<p>I really like the physics analogy of table being data at rest, while stream is data in motion. Some transforms cause data in motion to come to a rest, for example aggregations, because they accumulate before proceeding to the next stage. Conversely, a trigger puts data at rest into motion, because it sends events downstream when the condition arrives.</p>
<p>I think an interesting analogy for aggregation of streams could be that of a dam. It accumulates water and lets some of it flow downstream. I don’t recall seeing this in any materials I read so far, though there’s the <a href="https://en.wikipedia.org/wiki/Leaky_bucket">leaky bucket</a> algorithm for rate limiting, which has the spirit of a dam, albeit at a smaller scale.</p>
<h3 id="chapter-7---the-practicalities-of-persistent-state">Chapter 7 - The Practicalities of Persistent State</h3>
<p>This chapter discusses checkpoints, which provides a way to achieve at-least-once or exactly-once semantics. The idea is to write computed data to a persistent state periodically so it can be recovered in cases of failures.</p>
<p>Persistence is often applied when aggregations such as group by happens. This is because data needs to leave the machine it’s currently in to be sent to the next stage (i.e. shuffling, like in MapReduce).</p>
<p>Depending on what kind of aggregation we perform, we need to store more or less data. At one end of the spectrum is a simple list concatenation, which essentially involves storing every single event to a persistent state. On the other side is an aggregation into a scalar, much cheaper to store, but it restricts which aggregations can be used (must be associative and commutative).</p>
<p>Beam provides a very custom API that allows reading and writing to shared state inside user functions. Akidau provides a case study which to track conversion attribution, which involves traversing a tree.</p>
<h3 id="chapter-8---streaming-sql">Chapter 8 - Streaming SQL</h3>
<p>In this chapter the author builds upon the unified view of streams and tables and proposes a unified SQL dialect that supports both batch and stream processing.</p>
<p>It introduces the concept of Time-Varying Relation (TVR) which is basically a series of snapshots of a table over time, each time it is modified. A table represents one of these snapshots at a given time, while a stream is the delta of changes, i.e. different views of the same thing.</p>
<p>With that model in mind, a hypothetical SQL dialect would allow users to specify which view to get via a modifier. For example, for a table view:</p>
<figure class="highlight"><pre><code class="language-sql" data-lang="sql"><span class="k">SELECT</span> <span class="k">TABLE</span>
<span class="n">name</span><span class="p">,</span>
<span class="k">SUM</span><span class="p">(</span><span class="n">score</span><span class="p">)</span> <span class="k">as</span> <span class="n">total</span><span class="p">,</span>
<span class="k">MAX</span><span class="p">(</span><span class="nb">time</span><span class="p">)</span> <span class="k">as</span> <span class="nb">time</span>
<span class="k">FROM</span> <span class="n">user_scores</span>
<span class="k">GROUP</span> <span class="k">BY</span> <span class="n">name</span><span class="p">;</span></code></pre></figure>
<p>Possible output:</p>
<figure class="highlight"><pre><code class="language-text" data-lang="text">| name | total | time |
| ----- | ----- | ----- |
| Julie | 8 | 12:03 |
| Frank | 3 | 12:03 |</code></pre></figure>
<p>For a stream view, we would have almost the same syntax but with different default semantics. One interesting bit is that because stream processing can do retractions (see <em>Chapter 2</em> and <em>Chapter 4</em>), a system-level column exists indicating whether a row is a retraction:</p>
<figure class="highlight"><pre><code class="language-sql" data-lang="sql"><span class="k">SELECT</span> <span class="n">STREAM</span>
<span class="n">name</span><span class="p">,</span>
<span class="k">SUM</span><span class="p">(</span><span class="n">score</span><span class="p">)</span> <span class="k">as</span> <span class="n">total</span><span class="p">,</span>
<span class="k">MAX</span><span class="p">(</span><span class="nb">time</span><span class="p">)</span> <span class="k">as</span> <span class="nb">time</span><span class="p">,</span>
<span class="n">Sys</span><span class="p">.</span><span class="n">Undo</span> <span class="k">as</span> <span class="n">undo</span>
<span class="k">FROM</span> <span class="n">user_scores</span>
<span class="k">GROUP</span> <span class="k">BY</span> <span class="n">name</span><span class="p">;</span></code></pre></figure>
<p>Possible output:</p>
<figure class="highlight"><pre><code class="language-text" data-lang="text">| name | total | time | undo |
| ----- | ----- | ----- | ---- |
| Julie | 7 | 12:01 | |
| Frank | 3 | 12:03 | |
| Julie | 7 | 12:03 | true |
| Julie | 8 | 12:03 | |
..... [12:01, 12:03] .........</code></pre></figure>
<p>The last line indicates the stream is not over, but that the rows shown is over a specific time interval.</p>
<h3 id="chapter-9---streaming-joins">Chapter 9 - Streaming Joins</h3>
<p>This chapter build on the SQL syntax to introduce Streaming joins, that is, joining two streams based on a predicate. Akidau describes many variants of joins like <code class="language-plaintext highlighter-rouge">FULL OUTER</code>, <code class="language-plaintext highlighter-rouge">INNER</code>, <code class="language-plaintext highlighter-rouge">LEFT</code>, etc, and that all variants can be seen as a special case of <code class="language-plaintext highlighter-rouge">FULL OUTER</code>.</p>
<p>From the perspective of the stream-table theory, join is a grouping operation which turns streams into a table (puts data at rest).</p>
<p>A common pattern is to join on two keys being equal and to define a time range (window) for the joins to happen, but in theory stream joins can be “unwindowed”.</p>
<p>It has the same challenges as stream aggregations, namely dealing with out-of-order data. To work around that we also need to leverage watermarks and retractions.</p>
<h3 id="chapter-10---the-evolution-of-large-scale-data-processing">Chapter 10 - The Evolution of Large-Scale Data Processing</h3>
<p>This chapter reviews some distributed systems that inspired the current state-of-the-art for stream processing. It mentions Map-Reduce, Hadoop, FlumeJava, Storm, Spark Streaming, Millwheel and Kafka.</p>
<p>It culminates with Data Flow (at Google), Flink (open-source), and the Beam model (sort of a high-level spec which Data Flow and Flink try to adhere to).</p>
<h2 id="conclusion">Conclusion</h2>
<p>I found the visualizations and diagrams one of the highlights of the book. There were a lot of aha moments when looking at the figures after trying to wrap my head around some concept. There are many reviews in <a href="https://www.goodreads.com/book/show/43734674-streaming-systems">Goodreads</a> complaining the visualizations are subpar because they were made to be animated and don’t fit well in print. I personally didn’t find them an issue.</p>
<p>The stream and table theory and the surround analogies with concepts in physics was elucidating.</p>
<p>I liked the fact the authors were upfront about which person wrote each chapter. I recall having a bad experience reading <a href="http://book.realworldhaskell.org/">Real World Haskell</a> because the writing between chapters varied widely. Having known who wrote what would help to set expectations, perhaps even to decide whether to skip chapters.</p>
<p>As mentioned in my Chapter 2 notes, I found the “What, Where, When and How” view/analogy very confusing.</p>
<h2 id="related-posts">Related Posts</h2>
<ul>
<li><a href="https://www.kuniga.me/blog/2022/05/03/review-designing-data-intensive-applications.html">Review: Designing Data Intensive Applications</a> - Martin Kleppmann’s book discusses practical distributed systems more broadly but it does have a section on stream processing. He also provides a view of stream and tables, including the calculus analogy: stream being the derivative of tables and tables the integral of streams.</li>
</ul>
<h2 id="references">References</h2>
<ul>
<li>[1] <em>Streaming Systems</em> by Tyler Akidau, Slava Chernyak and Reuven Lax</li>
<li>[2] The Dataflow Model: A Practical Approach to Balancing Correctness, Latency, and Cost in Massive-Scale, Unbounded, Out-of-Order Data Processing - by Akidau et al.</li>
</ul>Guilherme KunigamiIn this post we will review the book Streaming Systems by Tyler Akidau, Slava Chernyak and Reuven Lax [1]. The book focuses on distributed streaming processing systems, reflecting the authors’ experience of building DataFlow at Google. Akidau is also one of the founders of Apache Beam. We’ll go over some detail on each chapter. The notes might be missing context because I took them while I had the book in front of me, but I tried to fill some of it in as I noticed them. If you’re just interested in a summary, jump to the Conclusion.Baum-Welch Algorithm: Python Implementation2022-07-23T00:00:00+00:002022-07-23T00:00:00+00:00https://www.kuniga.me/blog/2022/07/23/baum-welch-algorithm-in-python<!-- This needs to be define as included html because variables are not inherited by Jekyll pages -->
<p>In this post we provide an implementation of the Baum-Welch algorithm in Python. We discussed the theory in a previous post: <a href="https://www.kuniga.me/blog/2022/07/19/baum-welch-algorithm.html">Baum-Welch Algorithm: Theory</a>.</p>
<!--more-->
<h2 id="theory-recap">Theory Recap</h2>
<p>Let’s first summarize the algorithm in high-level. We first generate an initial value for $\theta$, $\theta^{(1)}$. Then we obtain $\theta^{(n+1)}$ from $\theta^{(n)}$ until $P_\theta(Y = O)$ stops improving.</p>
<p>To obtain $\theta^{(n+1)}$ from $\theta^{(n)}$ we first compute these intermediate variables:</p>
\[\begin{equation}
\begin{split}
\alpha_i(1) & = \pi_i b_{i, o_{1}} & \\
\alpha_i(t + 1) & = b_{i, o_{t+1}} \sum_{j \in S} \alpha_j(t) a_{ji} \qquad & t = 1, \cdots, T-1\\
\\
\beta_i(T) & = 1 & \\
\beta_i(t) &= \sum_{j \in S} a_{ij} \beta_j(t + 1) b_{j, o_{t+1}} \qquad & t = 1, \cdots, T-1\\
\\
\gamma_i(t) &= \frac{\alpha_i(t) \beta_i(t)}{\sum_{j = 1}^{N} \alpha_j(t) \beta_j(t)} \qquad & t = 1, \cdots, T\\
\\
\xi_{ij}(t) &= \frac{\alpha_i(t) a_{ij} b_{j, o_{t+1}} \beta_j(t+1)}{\sum_{i' = 1}^{N} \sum_{j' = 1}^{N} \alpha_i(t) a_{i'j'} b_{j', o_{t+1}} \beta_j(t+1)} \qquad & t = 1, \cdots, T-1\\
\end{split}
\end{equation}\]
<p>It’s worth noting that in practice the denominator of $\gamma_i(t)$ and $\xi_{ij}(t)$ are all the same, $P_\theta(Y = O)$ and hence we don’t have to compute them repeatedly.</p>
<p>We can now update $\theta = (A, B, \pi)$ from $\gamma$ and $\xi$:</p>
\[\begin{equation}
\begin{split}
a_{ij} &= \frac{\sum_{t=1}^{T-1} \xi_{ij}(t)}{\sum_{t=1}^{T-1} \gamma_{i}(t)}\\
\\
b_{ik} &= \frac{\sum_{t=1}^{T} 1[y_t = v_k] \xi_{ij}(t)}{\sum_{t=1}^{T} \gamma_{i}(t)}\\
\\
\pi &= \gamma_{i}(1)
\end{split}
\end{equation}\]
<p>Where $1[y_t = v_k]$ is the indicator function:</p>
\[\begin{equation}
1[y_t = v_k] =\left\{
\begin{array}{@{}ll@{}}
1, & \text{if}\ y_t = v_k \\
0, & \text{otherwise}
\end{array}\right.
\end{equation}\]
<h2 id="python-implementation">Python Implementation</h2>
<p>Implementing the Baum-Welch algorithm is relatively straightforward. It does involve a bunch of sums and indexes which are harder to debug, though.</p>
<p>The model is just a bag of parameters including the observations, represented by a <code class="language-plaintext highlighter-rouge">dataclass</code>:</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="kn">from</span> <span class="nn">numpy.typing</span> <span class="kn">import</span> <span class="n">ArrayLike</span>
<span class="o">@</span><span class="n">dataclass</span>
<span class="k">class</span> <span class="nc">HMM</span><span class="p">:</span>
<span class="n">hidden_sts</span><span class="p">:</span> <span class="n">ArrayLike</span>
<span class="n">visible_sts</span><span class="p">:</span> <span class="n">ArrayLike</span>
<span class="n">trans_prob</span><span class="p">:</span> <span class="n">ArrayLike</span>
<span class="n">obs_prob</span><span class="p">:</span> <span class="n">ArrayLike</span>
<span class="n">ini_prob</span><span class="p">:</span> <span class="n">ArrayLike</span>
<span class="n">obs</span><span class="p">:</span> <span class="n">ArrayLike</span></code></pre></figure>
<p>The algorithm is implemented by the <code class="language-plaintext highlighter-rouge">BaumWelch</code> class, which is stateless but groups all methods for computing the auxiliary variables. The <code class="language-plaintext highlighter-rouge">run()</code> method keeps running the <code class="language-plaintext highlighter-rouge">iterate()</code> method until the likelihood stops improving:</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="k">class</span> <span class="nc">BaumWelch</span><span class="p">:</span>
<span class="k">def</span> <span class="nf">run</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">model</span><span class="p">):</span>
<span class="n">current_likelihood</span> <span class="o">=</span> <span class="bp">self</span><span class="p">.</span><span class="n">get_likelihood</span><span class="p">(</span><span class="n">model</span><span class="p">)</span>
<span class="k">while</span> <span class="bp">True</span><span class="p">:</span>
<span class="n">new_model</span> <span class="o">=</span> <span class="bp">self</span><span class="p">.</span><span class="n">iterate</span><span class="p">(</span><span class="n">model</span><span class="p">)</span>
<span class="n">new_likelihood</span> <span class="o">=</span> <span class="bp">self</span><span class="p">.</span><span class="n">get_likelihood</span><span class="p">(</span><span class="n">new_model</span><span class="p">)</span>
<span class="k">if</span> <span class="n">new_likelihood</span> <span class="o"><</span> <span class="n">current_likelihood</span><span class="p">:</span>
<span class="k">break</span>
<span class="n">model</span> <span class="o">=</span> <span class="n">new_model</span>
<span class="n">current_likelihood</span> <span class="o">=</span> <span class="n">new_likelihood</span></code></pre></figure>
<p>The function <code class="language-plaintext highlighter-rouge">iterate()</code> consists of computing the auxiliary variables and then creating a new model. Noting that $P_\theta(Y = O)$ is the likelihood and also the denominator for all $\gamma$ and $\xi$, so we can compute it separately and reuse.</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="k">def</span> <span class="nf">iterate</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="n">alpha</span> <span class="o">=</span> <span class="bp">self</span><span class="p">.</span><span class="n">calc_alpha</span><span class="p">()</span>
<span class="n">beta</span> <span class="o">=</span> <span class="bp">self</span><span class="p">.</span><span class="n">calc_beta</span><span class="p">()</span>
<span class="n">likelihood</span> <span class="o">=</span> <span class="bp">self</span><span class="p">.</span><span class="n">get_likelihood</span><span class="p">()</span>
<span class="n">gamma</span> <span class="o">=</span> <span class="bp">self</span><span class="p">.</span><span class="n">calc_gamma</span><span class="p">(</span><span class="n">alpha</span><span class="p">,</span> <span class="n">beta</span><span class="p">)</span> <span class="o">/</span> <span class="n">likelihood</span>
<span class="n">xi</span> <span class="o">=</span> <span class="bp">self</span><span class="p">.</span><span class="n">calc_xi</span><span class="p">(</span><span class="n">alpha</span><span class="p">,</span> <span class="n">beta</span><span class="p">)</span> <span class="o">/</span> <span class="n">likelihood</span>
<span class="k">return</span> <span class="n">HMM</span><span class="p">(</span>
<span class="n">hidden_sts</span><span class="o">=</span><span class="bp">self</span><span class="p">.</span><span class="n">model</span><span class="p">.</span><span class="n">hidden_sts</span><span class="p">,</span>
<span class="n">visible_sts</span><span class="o">=</span><span class="bp">self</span><span class="p">.</span><span class="n">model</span><span class="p">.</span><span class="n">visible_sts</span><span class="p">,</span>
<span class="n">trans_prob</span><span class="o">=</span><span class="bp">self</span><span class="p">.</span><span class="n">calc_trans_prob</span><span class="p">(</span><span class="n">gamma</span><span class="p">,</span> <span class="n">xi</span><span class="p">),</span>
<span class="n">obs_prob</span><span class="o">=</span><span class="bp">self</span><span class="p">.</span><span class="n">calc_obs_prob</span><span class="p">(</span><span class="n">gamma</span><span class="p">,</span> <span class="n">xi</span><span class="p">),</span>
<span class="n">ini_prob</span><span class="o">=</span><span class="bp">self</span><span class="p">.</span><span class="n">calc_ini_prob</span><span class="p">(</span><span class="n">gamma</span><span class="p">),</span>
<span class="p">)</span></code></pre></figure>
<p>Computing $\alpha$, $\beta$, $\gamma$ and $\xi$ is almost a direct implementation of their definitions above, and we leverage NumPy’s API for more conciseness.</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="k">def</span> <span class="nf">calc_alpha</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">model</span><span class="p">):</span>
<span class="n">alpha</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">empty</span><span class="p">(</span><span class="n">shape</span><span class="o">=</span><span class="p">(</span><span class="n">model</span><span class="p">.</span><span class="n">N</span><span class="p">,</span> <span class="n">model</span><span class="p">.</span><span class="n">T</span><span class="p">))</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="n">model</span><span class="p">.</span><span class="n">states</span><span class="p">():</span>
<span class="n">alpha</span><span class="p">[</span><span class="n">i</span><span class="p">,</span> <span class="mi">0</span><span class="p">]</span> <span class="o">=</span> <span class="n">model</span><span class="p">.</span><span class="n">ini_prob</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">*</span> <span class="n">model</span><span class="p">.</span><span class="n">obs_prob</span><span class="p">[</span><span class="n">i</span><span class="p">,</span> <span class="n">model</span><span class="p">.</span><span class="n">obs</span><span class="p">[</span><span class="mi">0</span><span class="p">]]</span>
<span class="k">for</span> <span class="n">t</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="n">model</span><span class="p">.</span><span class="n">T</span><span class="p">):</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="n">model</span><span class="p">.</span><span class="n">states</span><span class="p">():</span>
<span class="n">alpha</span><span class="p">[</span><span class="n">i</span><span class="p">,</span> <span class="n">t</span><span class="p">]</span> <span class="o">=</span> <span class="nb">sum</span><span class="p">(</span><span class="n">alpha</span><span class="p">[:,</span> <span class="n">t</span> <span class="o">-</span> <span class="mi">1</span><span class="p">]</span> <span class="o">*</span> <span class="n">model</span><span class="p">.</span><span class="n">trans_prob</span><span class="p">[:,</span> <span class="n">i</span><span class="p">])</span> <span class="o">*</span> \
<span class="n">model</span><span class="p">.</span><span class="n">obs_prob</span><span class="p">[</span><span class="n">i</span><span class="p">,</span> <span class="n">model</span><span class="p">.</span><span class="n">obs</span><span class="p">[</span><span class="n">t</span><span class="p">]]</span>
<span class="k">return</span> <span class="n">alpha</span>
<span class="c1"># backward step
</span><span class="k">def</span> <span class="nf">calc_beta</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">model</span><span class="p">):</span>
<span class="n">beta</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">empty</span><span class="p">(</span><span class="n">shape</span><span class="o">=</span><span class="p">(</span><span class="n">model</span><span class="p">.</span><span class="n">N</span><span class="p">,</span> <span class="n">model</span><span class="p">.</span><span class="n">T</span><span class="p">))</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="n">model</span><span class="p">.</span><span class="n">states</span><span class="p">():</span>
<span class="n">beta</span><span class="p">[</span><span class="n">i</span><span class="p">,</span> <span class="n">model</span><span class="p">.</span><span class="n">T</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span> <span class="o">=</span> <span class="mi">1</span>
<span class="k">for</span> <span class="n">t</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">model</span><span class="p">.</span><span class="n">T</span><span class="o">-</span><span class="mi">2</span><span class="p">,</span> <span class="o">-</span><span class="mi">1</span><span class="p">,</span> <span class="o">-</span><span class="mi">1</span><span class="p">):</span> <span class="c1"># T-2 to 0
</span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="n">model</span><span class="p">.</span><span class="n">states</span><span class="p">():</span>
<span class="n">beta</span><span class="p">[</span><span class="n">i</span><span class="p">,</span> <span class="n">t</span><span class="p">]</span> <span class="o">=</span> <span class="nb">sum</span><span class="p">(</span>
<span class="n">model</span><span class="p">.</span><span class="n">trans_prob</span><span class="p">[</span><span class="n">i</span><span class="p">,</span> <span class="p">:]</span> <span class="o">*</span>
<span class="n">beta</span><span class="p">[:,</span> <span class="n">t</span> <span class="o">+</span> <span class="mi">1</span><span class="p">]</span> <span class="o">*</span>
<span class="n">model</span><span class="p">.</span><span class="n">obs_prob</span><span class="p">[:,</span> <span class="n">model</span><span class="p">.</span><span class="n">obs</span><span class="p">[</span><span class="n">t</span> <span class="o">+</span> <span class="mi">1</span><span class="p">]]</span>
<span class="p">)</span>
<span class="k">return</span> <span class="n">beta</span>
<span class="c1"># un-normalized
</span><span class="k">def</span> <span class="nf">calc_gamma</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">model</span><span class="p">,</span> <span class="n">alpha</span><span class="p">,</span> <span class="n">beta</span><span class="p">):</span>
<span class="n">gamma</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">empty</span><span class="p">(</span><span class="n">shape</span><span class="o">=</span><span class="n">alpha</span><span class="p">.</span><span class="n">shape</span><span class="p">)</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="n">model</span><span class="p">.</span><span class="n">states</span><span class="p">():</span>
<span class="n">gamma</span><span class="p">[</span><span class="n">i</span><span class="p">,</span> <span class="p">]</span> <span class="o">=</span> <span class="n">alpha</span><span class="p">[</span><span class="n">i</span><span class="p">,</span> <span class="p">]</span> <span class="o">*</span> <span class="n">beta</span><span class="p">[</span><span class="n">i</span><span class="p">,</span> <span class="p">]</span>
<span class="k">return</span> <span class="n">gamma</span>
<span class="c1"># un-normalized
</span><span class="k">def</span> <span class="nf">calc_xi</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">model</span><span class="p">,</span> <span class="n">alpha</span><span class="p">,</span> <span class="n">beta</span><span class="p">):</span>
<span class="n">xi</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">empty</span><span class="p">(</span><span class="n">shape</span><span class="o">=</span><span class="p">(</span><span class="n">model</span><span class="p">.</span><span class="n">N</span><span class="p">,</span> <span class="n">model</span><span class="p">.</span><span class="n">N</span><span class="p">,</span> <span class="n">model</span><span class="p">.</span><span class="n">T</span><span class="p">))</span>
<span class="k">for</span> <span class="n">t</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">model</span><span class="p">.</span><span class="n">T</span> <span class="o">-</span> <span class="mi">1</span><span class="p">):</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="n">model</span><span class="p">.</span><span class="n">states</span><span class="p">():</span>
<span class="k">for</span> <span class="n">j</span> <span class="ow">in</span> <span class="n">model</span><span class="p">.</span><span class="n">states</span><span class="p">():</span>
<span class="n">a</span> <span class="o">=</span> <span class="n">model</span><span class="p">.</span><span class="n">trans_prob</span><span class="p">[</span><span class="n">i</span><span class="p">,</span> <span class="n">j</span><span class="p">]</span>
<span class="n">b</span> <span class="o">=</span> <span class="n">model</span><span class="p">.</span><span class="n">obs_prob</span><span class="p">[</span><span class="n">j</span><span class="p">,</span> <span class="n">model</span><span class="p">.</span><span class="n">obs</span><span class="p">[</span><span class="n">t</span> <span class="o">+</span> <span class="mi">1</span><span class="p">]]</span>
<span class="n">xi</span><span class="p">[</span><span class="n">i</span><span class="p">,</span> <span class="n">j</span><span class="p">,</span> <span class="n">t</span><span class="p">]</span> <span class="o">=</span> <span class="n">alpha</span><span class="p">[</span><span class="n">i</span><span class="p">,</span> <span class="n">t</span><span class="p">]</span> <span class="o">*</span> <span class="n">a</span> <span class="o">*</span> <span class="n">b</span> <span class="o">*</span> <span class="n">beta</span><span class="p">[</span><span class="n">j</span><span class="p">,</span> <span class="n">t</span><span class="o">+</span><span class="mi">1</span><span class="p">]</span>
<span class="k">return</span> <span class="n">xi</span></code></pre></figure>
<p>Updating $A$, $B$ and $\pi$ is also a most direct implementation of the equations above:</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="k">def</span> <span class="nf">calc_trans_prob</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">model</span><span class="p">,</span> <span class="n">gamma</span><span class="p">,</span> <span class="n">xi</span><span class="p">):</span>
<span class="n">trans_prob</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">empty</span><span class="p">(</span><span class="n">shape</span><span class="o">=</span><span class="n">model</span><span class="p">.</span><span class="n">trans_prob</span><span class="p">.</span><span class="n">shape</span><span class="p">)</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="n">model</span><span class="p">.</span><span class="n">states</span><span class="p">():</span>
<span class="n">den</span> <span class="o">=</span> <span class="nb">sum</span><span class="p">(</span><span class="n">gamma</span><span class="p">[</span><span class="n">i</span><span class="p">,</span> <span class="mi">0</span><span class="p">:</span><span class="n">model</span><span class="p">.</span><span class="n">T</span><span class="o">-</span><span class="mi">1</span><span class="p">])</span>
<span class="k">for</span> <span class="n">j</span> <span class="ow">in</span> <span class="n">model</span><span class="p">.</span><span class="n">states</span><span class="p">():</span>
<span class="n">trans_prob</span><span class="p">[</span><span class="n">i</span><span class="p">,</span> <span class="n">j</span><span class="p">]</span> <span class="o">=</span> <span class="nb">sum</span><span class="p">(</span><span class="n">xi</span><span class="p">[</span><span class="n">i</span><span class="p">,</span> <span class="n">j</span><span class="p">,</span> <span class="mi">0</span><span class="p">:</span><span class="n">model</span><span class="p">.</span><span class="n">T</span><span class="o">-</span><span class="mi">1</span><span class="p">])</span> <span class="o">/</span> <span class="n">den</span>
<span class="k">return</span> <span class="n">trans_prob</span>
<span class="k">def</span> <span class="nf">calc_obs_prob</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">model</span><span class="p">,</span> <span class="n">gamma</span><span class="p">,</span> <span class="n">xi</span><span class="p">):</span>
<span class="n">obs_prob</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">empty</span><span class="p">(</span><span class="n">shape</span><span class="o">=</span><span class="n">model</span><span class="p">.</span><span class="n">obs_prob</span><span class="p">.</span><span class="n">shape</span><span class="p">)</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="n">model</span><span class="p">.</span><span class="n">states</span><span class="p">():</span>
<span class="k">for</span> <span class="n">k</span> <span class="ow">in</span> <span class="n">model</span><span class="p">.</span><span class="n">observations</span><span class="p">():</span>
<span class="n">num</span> <span class="o">=</span> <span class="nb">sum</span><span class="p">(</span>
<span class="n">gamma</span><span class="p">[</span><span class="n">i</span><span class="p">,</span> <span class="n">t</span><span class="p">]</span> <span class="k">for</span> <span class="n">t</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">model</span><span class="p">.</span><span class="n">T</span><span class="p">)</span> <span class="k">if</span> <span class="n">model</span><span class="p">.</span><span class="n">obs</span><span class="p">[</span><span class="n">t</span><span class="p">]</span> <span class="o">==</span> <span class="n">k</span>
<span class="p">)</span>
<span class="n">obs_prob</span><span class="p">[</span><span class="n">i</span><span class="p">,</span> <span class="n">k</span><span class="p">]</span> <span class="o">=</span> <span class="n">num</span> <span class="o">/</span> <span class="nb">sum</span><span class="p">(</span><span class="n">gamma</span><span class="p">[</span><span class="n">i</span><span class="p">,</span> <span class="p">:])</span>
<span class="k">return</span> <span class="n">obs_prob</span>
<span class="k">def</span> <span class="nf">calc_ini_prob</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">gamma</span><span class="p">):</span>
<span class="k">return</span> <span class="n">gamma</span><span class="p">[:,</span> <span class="mi">0</span><span class="p">]</span></code></pre></figure>
<p>The full code is on <a href="https://github.com/kunigami/kunigami.github.io/blob/master/blog/code/2022-07-23-baum-welch-algorithm-in-python/baum_welch.py">Github</a> and is less than 200 lines.</p>
<h2 id="experiments">Experiments</h2>
<p>Here we describe running the implementation against some small examples.</p>
<h3 id="wikipedias-example">Wikipedia’s Example</h3>
<p>Wikipedia [3] has this simple example:</p>
<blockquote>
<p>Suppose we have a chicken from which we collect eggs at noon every day. Now whether or not the chicken has laid eggs for collection depends on some unknown factors that are hidden. We can however (for simplicity) assume that the chicken is always in one of two states that influence whether the chicken lays eggs, and that this state only depends on the state on the previous day. Now we don’t know the state at the initial starting point, we don’t know the transition probabilities between the two states and we don’t know the probability that the chicken lays an egg given a particular state</p>
</blockquote>
<p>It also proposed an initial value for the $\theta$ param which corresponds to the following code:</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="n">model</span> <span class="o">=</span> <span class="n">HMM</span><span class="p">(</span>
<span class="n">hidden_sts</span><span class="o">=</span><span class="p">[</span><span class="s">"state 1"</span><span class="p">,</span> <span class="s">"state 2"</span><span class="p">],</span>
<span class="n">visible_sts</span><span class="o">=</span><span class="p">[</span><span class="s">"no eggs"</span><span class="p">,</span> <span class="s">"eggs"</span><span class="p">],</span>
<span class="n">trans_prob</span><span class="o">=</span><span class="n">np</span><span class="p">.</span><span class="n">array</span><span class="p">([</span>
<span class="p">[</span><span class="mf">0.5</span><span class="p">,</span> <span class="mf">0.5</span><span class="p">],</span>
<span class="p">[</span><span class="mf">0.3</span><span class="p">,</span> <span class="mf">0.7</span><span class="p">],</span>
<span class="p">]),</span>
<span class="n">obs_prob</span><span class="o">=</span><span class="n">np</span><span class="p">.</span><span class="n">array</span><span class="p">([</span>
<span class="p">[</span><span class="mf">0.3</span><span class="p">,</span> <span class="mf">0.7</span><span class="p">],</span>
<span class="p">[</span><span class="mf">0.8</span><span class="p">,</span> <span class="mf">0.2</span><span class="p">],</span>
<span class="p">]),</span>
<span class="n">ini_prob</span><span class="o">=</span><span class="p">[</span><span class="mf">0.2</span><span class="p">,</span> <span class="mf">0.8</span><span class="p">],</span>
<span class="n">obs</span><span class="o">=</span><span class="p">[</span><span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">]</span>
<span class="p">)</span></code></pre></figure>
<p>After running the Baum-Welch algorithm we obtain the updated param $\theta$:</p>
<figure class="highlight"><pre><code class="language-text" data-lang="text">Transition Probability:
[
[0.5004038 0.4995962 ]
[0.14308799 0.85691201]
]
Observation Probability:
[
[0.01, 0.99],
[1.00, 0.00],
]
Initial Probability:
[0.00 1.00]</code></pre></figure>
<p>This model essentially tells us that we’ll for sure start in hidden <em>state 2</em>, with probability of switching to <em>state 1</em> of ~15%. Once in <em>state 1</em>, we have 50/50 chance of staying vs. changing state.</p>
<p>The observation matrix tells us that the observation on the state is deterministic. If in <em>state 1</em> we’ll emit “eggs”, while in <em>state 2</em> we’ll emit “no eggs”.</p>
<p>Intuitively this seems consistent with the observation: there are 10 observation events, 80% of which are “no eggs” and 20% are “eggs”. The transition probability says we’re more likely to be on <em>state 2</em> on any given time, in which case we’ll be observing “no eggs” only.</p>
<p>The computed likelihood is 1.41% which feels low, but unless the model perfectly models the underlying phenomena, more observations will tend to reduce the likelihood.</p>
<h3 id="toy-example">Toy Example</h3>
<p>I tried to create a case that can be perfectly modeled by an HMM. It has the same domain as the Wikipedia example but the observation can be generated by a model where we <em>always</em> transition states and where the observation is deterministic from the hidden state.</p>
<p>We start with our initial $\theta$ slightly biased towards the idea model (40/60 split):</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="n">model</span> <span class="o">=</span> <span class="n">HMM</span><span class="p">(</span>
<span class="n">hidden_sts</span><span class="o">=</span><span class="p">[</span><span class="s">"state1"</span><span class="p">,</span> <span class="s">"state2"</span><span class="p">],</span>
<span class="n">visible_sts</span><span class="o">=</span><span class="p">[</span><span class="s">"yes"</span><span class="p">,</span> <span class="s">"no"</span><span class="p">],</span>
<span class="n">trans_prob</span><span class="o">=</span><span class="n">np</span><span class="p">.</span><span class="n">array</span><span class="p">([</span>
<span class="p">[</span><span class="mf">0.4</span><span class="p">,</span> <span class="mf">0.6</span><span class="p">],</span>
<span class="p">[</span><span class="mf">0.6</span><span class="p">,</span> <span class="mf">0.4</span><span class="p">],</span>
<span class="p">]),</span>
<span class="n">obs_prob</span><span class="o">=</span><span class="n">np</span><span class="p">.</span><span class="n">array</span><span class="p">([</span>
<span class="p">[</span><span class="mf">0.6</span><span class="p">,</span> <span class="mf">0.4</span><span class="p">],</span>
<span class="p">[</span><span class="mf">0.4</span><span class="p">,</span> <span class="mf">0.6</span><span class="p">],</span>
<span class="p">]),</span>
<span class="n">ini_prob</span><span class="o">=</span><span class="p">[</span><span class="mf">1.0</span><span class="p">,</span> <span class="mi">0</span><span class="p">],</span>
<span class="n">obs</span><span class="o">=</span><span class="p">[</span><span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">]</span>
<span class="p">)</span></code></pre></figure>
<p>The algorithm converges to what we would expect and the computed likelihood is 100%, which is a good sanity check.</p>
<h3 id="choice-of-initial-value">Choice of Initial Value</h3>
<p>From a few runs with the examples above, I found the algorithm very sensitive to the initial choice of $\theta$. One of my first attempts was to make no assumptions and start all probability matrices with a uniform distribution but the algorithm couldn’t make progress.</p>
<p>It might be necessary to try multiple initial choices, perhaps each encoding some bias towards different dimensions to avoid having to rely too much on a single choice.</p>
<h2 id="conclusion">Conclusion</h2>
<p>The main motivation for studying the Baum-Welch is to apply it to speech recognition. My next step is to understand how to adapt it for such use case. In [2] Rabiner suggests either handling observation vectors with continuous values (as opposed to discrete ones we’ve studied so far) or using <em>Vector Quantization</em> (VQ).</p>
<h2 id="references">References</h2>
<ul>
<li>[<a href="https://www.kuniga.me/blog/2022/07/19/baum-welch-algorithm.html">1</a>] NP-Incompleteness: Baum-Welch Algorithm: Theory</li>
<li>[2] A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition - L. Rabiner</li>
<li>[<a href="https://en.wikipedia.org/wiki/Baum%E2%80%93Welch_algorithm">3</a>] Wikipedia: Baum–Welch algorithm</li>
</ul>Guilherme KunigamiIn this post we provide an implementation of the Baum-Welch algorithm in Python. We discussed the theory in a previous post: Baum-Welch Algorithm: Theory.