Jekyll2021-04-10T18:46:30+00:00https://www.kuniga.me/feed.xmlNP-IncompletenessKunigami's Technical BlogGuilherme KunigamiSource-Filter Model2021-04-03T00:00:00+00:002021-04-03T00:00:00+00:00https://www.kuniga.me/blog/2021/04/03/source-filter-model<!-- This needs to be define as included html because variables are not inherited by Jekyll pages -->
<figure class="image_float_left">
<img src="https://www.kuniga.me/resources/blog/2021-04-03-source-filter-model/fant.png" alt="Gunnar Fant's thumbnail" />
</figure>
<p>Gunnar Fant was a Swedish researcher in speech science. He received his MSc in Electrical Engineering from KHT and worked at Ericsson and MIT. In 1960 he published the source-filter model of speech production, which became widely used [1].</p>
<p>In this post we’ll study the source-filter model as a simplified representation of human speech.</p>
<!--more-->
<h2 id="sound-waves">Sound waves</h2>
<p>First, let’s revisit the mechanics of sound. Sounds are what our brain perceives when air molecules vibrate inside our ear. More precisely, vitrating air molecules cause the eardrum and bones in the middle ear to also vibrate and these are in turn converted to electric signals by hair cells in the cochlea [2].</p>
<figure class="center_children">
<img src="https://www.kuniga.me/resources/blog/2021-04-03-source-filter-model/ear-anatomy.png" alt="Ear's anatomy" />
<figcaption>Figure 1: Ear's anatomy. Did you know that the external part of the ear is called <i>pinna</i>? <a href="https://www.osmosis.org/learn/Anatomy_and_physiology_of_the_ear">Source</a></figcaption>
</figure>
<p>In this sense, sound is just our brain interpretation or these vibrating molecules, just how color and pain are brain interpretations. However, sound is often used more generaly to refer to the vibration of molecules and can exist without a brain to interpret it. This will be the definition we’ll work with henceforth.</p>
<p>We often hear (pun intended) about the term <em>sound wave</em> which causes us to conjure images of wavy lines. Vibrating chords are often used as a physical analogy to waves. While useful, these analogies can make understanding the mechanism of sound a bit harder. According to Dan Russell [6]:</p>
<blockquote>
<p>Students are generally introduced to the concept of standing waves through a discussion of transverse standing waves on a string. (…) However, sound waves are longitudinal waves and the particle motion associated with a standing sound wave in a pipe is directed along the length of the pipe.</p>
</blockquote>
<p>Let’s take a closer look at sound waves represent in reality.</p>
<h3 id="longitudinal-waves">Longitudinal Waves</h3>
<p>Instead of vibrating chords like that of a guitar, I think a better analogy is the Slinky, the coil-like toy that can be easily compressed and de-compressed.</p>
<figure class="center_children">
<img src="https://www.kuniga.me/resources/blog/2021-04-03-source-filter-model/slinky.jpeg" alt="Slink" />
<figcaption>Figure 2: A Slinky.</figcaption>
</figure>
<p>If we fix one of the ends of a Slinky and compress the other side slightly, the compression travels along the coil. These are known as <em>longitudinal waves</em> [3] and it’s more similar to how sounds work.</p>
<p>When we beat a drum, its surface vibrates and in the process it dislocates small amounts of air molecules, alternating between pushing a bunch of molecules together which restricted their motion (higher pressure, lower displacement) - and then retreating, leaving more space for the molecules to spread (lower pressure, higher motion).</p>
<p>The wavy chart we often see as representing sounds is a graphical representation of either the pressure or displacement over time at a specific point. Alternatively, the chart might be depicting the pressure / displacement profile along some axis at some given instant of time.</p>
<figure class="center_children">
<img src="https://www.kuniga.me/resources/blog/2021-04-03-source-filter-model/sound-graph.png" alt="Graph" />
<figcaption>Figure 3: Depiction of the air molecules in a tube and their corresponding graphs showing the displacement and pressure at an instant of time. <a href="https://www.acs.psu.edu/drussell/Demos/StandingWaves/StandingWaves.html">Source: Dan Russell</a></figcaption>
</figure>
<p>This <a href="http://physics.bu.edu/~duffy/semester1/c20_disp_pressure.html">simulation</a> makes it easier to visualize the relationship between these charts.</p>
<p>It’s more convenient to work with the graphical form when modeling sounds because they’re much simpler, so we’ll primarily work with them from now on but it helps knowing what it’s actually happening behind the scenes to have better intuitions.</p>
<h3 id="amplitude-frequency-and-wavelength">Amplitude, Frequency and Wavelength</h3>
<p>Suppose we have a periodic source of sound, such that the pressure vs. time graphical representation at a given point is a perfect sine wave. The <em>amplitude</em> is the height of the crest. We can define the pressure at rest to be 0, so a positive amplitude represents increased pressure (<em>compression</em>), and a negative one decreased pressure (<em>rarefaction</em>).</p>
<p>The <em>frequency</em> is how many cycles happen in a period of time (the standard unit is Hertz, representing the number of cycles in a second).</p>
<p>The <em>wavelength</em> is the length of a complete cycle. In sound terms, it could be the physical distance between two adjacent regions of the highest pressure.</p>
<h3 id="the-speed-of-sound">The Speed of Sound</h3>
<p>Going back to the drum analogy, suppose we have a listener located a few meters away from us. When we beat the drum once, it will take some time before it reaches their ears. The distance between listener and source divided by the time it takes for the sound to be heard defines the speed of sound.</p>
<p>If we visualize the pressure profile along an axis it would look like a pulse:</p>
<figure class="center_children">
<img src="https://www.kuniga.me/resources/blog/2021-04-03-source-filter-model/pulse.gif" alt="Animated pulse propagation" />
</figure>
<p>Suppose we wanted them to listen the beat sooner.</p>
<p>Maybe we can hit the drum harder? This would cause more displacement of the air molecules and hence higher pressure during compression. In graphical terms, it would increase the <em>amplitude</em>, but wouldn’t make the sound travel faster.</p>
<p>Maybe we can hit the drum faster? This would “narrow” the size of the pulse but would not make it travel faster.</p>
<p>We learn in school about the formula:</p>
\[(1) \qquad c = f \lambda\]
<p>Where $c$ is the speed of sound, $f$ is the frequency and $\lambda$ is the wavelength. According to the model behind this formula, if temperature, pressure and medium are constant, $c$ is fixed. For example, if we increase the frequency of the sound it will cause the wavelength to decrease accordingly.</p>
<p>I find it a bit counter-intuitive to think there’s no way we can make the sound travel faster. If we were to throw a ball to our listener, throwing it with greater force would make the average speed of the ball to increase.</p>
<p>I don’t have enough understanding of the physics but the formula above arises from a model that is likely making some assumptions, namely that the vibration at the source is small enough for it to only cause “localized changes” in air particles [4].</p>
<p>If we really wanted to use force, we could cause an explosion at the source and this would push the <em>actual</em> air particles at the source towards the listener’s ears instead of the transitive vibrations. This would break the “sound as logitudinal waves” model and (1) would not apply.</p>
<figure class="center_children">
<img src="https://www.kuniga.me/resources/blog/2021-04-03-source-filter-model/explosion.jpeg" alt="Cloud from a nuclear explosion" />
</figure>
<p>Since we’re mostly interested in human speech, we are able to work with this sound model, but it’s worth introspecting what assumptions and simplifications are taking place.</p>
<h3 id="interference">Interference</h3>
<p>When two sound waves interfere, their effect is added. It can be a <em>constructive interference</em> if both have the same amplitude signal or a <em>destructive interference</em> otherwise. Destructive interference is used by noise cancelling headphones.</p>
<p>Most sound waves are not perfect sine waves as it’s often depicted, but a more complex shape. But if the sound is (or mostly) periodic, it can actually be decomposed into multiple sine waves (via the <a href="https://en.wikipedia.org/wiki/Fourier_transform">Fourier transform</a>) which when added approximate the original signal.</p>
<p>The representation of a periodic signal by its frequencies is very convenient because it requires much less information to be stored. An analogy would be the vector format for images (SVG) which can represent/approximate some classes of images by very few parameters.</p>
<h2 id="sound-in-a-tube">Sound in a Tube</h2>
<p>Studying sound propagation in a tube is interesting because it allows us to treat it as a one dimensional, which simplify things.</p>
<h3 id="reflection">Reflection</h3>
<p>Similar to light, sound can be reflected if it encounters a medium with different characteristics. This can be experienced when shouting inside a cave - the sound waves will bound off the walls and travel back to our ears. This phenomenon is also used by sonars for detecting objects under water.</p>
<p>If we send waves from one end of the tube to the other, reflection happens as well. The way it reflects depends on whether the other end of tube is closed or open.</p>
<p>If it’s closed, it’s similar to sound hitting a wall. The displacement of air molecules will tend to zero at the wall since it cannot push the solid’s molecules. Due to conservation of energy, the molecules will bounce back once they hit the wall, creating the reflected wave.</p>
<p>If it’s opened, I honestly don’t have a good intuition and haven’t found a good explanation online. In [5] however, it’s demonstrated empirically that such reflection happens. The authors also make the following comment:</p>
<blockquote>
<p>It’s difficult for students to conceptualize the nature of reflections from closed and open pipes—especially open pipes. “How can the sound reflect off something that isn’t there?”</p>
</blockquote>
<p>It should be possible to explain this behavior mathematically via fluid dynamics by assuming the conservation of energy [7] but I haven’t dug into that.</p>
<p>Regardless of the exact mechanism that causes reflection in an open tube to happen let’s consider the pressure at such point. Because the outside of the tube is so vast, it’s impossible for the air molecules to exert any pressure changes on it, so if we look at the pressure profile along the tube, it has to be at 0 (that is, the base pressure) at the end.</p>
<p>Summarizing, in an open tube the pressure is 0, while in a closed tube the displacement is 0.</p>
<figure class="center_children">
<img src="https://www.kuniga.me/resources/blog/2021-04-03-source-filter-model/relection-tube.png" alt="Two images of an osciloscope measuring sound pulses" />
<figcaption>Figure 5: Measuring the pressure caused by a sound pulse. In the image on the left, it was done with a closed end, and we see that the reflected pulse is inverted. In the image on the right, it was done with an open end, we see the reflected pulse has the same shape as the original pulse. <a href="https://sciencedemonstrations.fas.harvard.edu/presentations/sound-reflections-pipes">Source</a>.</figcaption>
</figure>
<h3 id="resonant-frequency">Resonant Frequency</h3>
<p>Reflection is an important component to consider because the reflected wave will interfere with the original wave. Depending on the “aligment” between these waves they can result in different effective waves.</p>
<p>If the alignment is right, the resulting wave will have maximum amplitude. The alignment can be controlled by changing the frequency, which in turn affects the wavelength. Frequencies for which the alignment yields maximum amplitude are called <em>resonant frequencies</em>.</p>
<p>A tube can act as a frequency filter. If we send sound waves composed of multiple primordial frequencies, the ones closer to the resonant frequencies will get amplified, effectively filtering out the other frequencies. This is the <em>filter</em> component of the <em>source-filter</em> model we’ll talk about at the end.</p>
<h3 id="example-the-clarinet">Example: The Clarinet</h3>
<p>A real-world example that is very close to the tube model is the clarinet (check this interesting <a href="https://www.youtube.com/watch?v=nENXs6n_ITI">introduction</a> from Philharmonia Orchestra).</p>
<figure class="center_children">
<img src="https://www.kuniga.me/resources/blog/2021-04-03-source-filter-model/clarinet.jpg" alt="Clarinet" />
</figure>
<p>It is a musical instrument which can be modeled as a tube, open at one end and closed on the other.</p>
<p>The vibration is created at the one end of the tube by blowing air through the wooden piece called the <em>reed</em>.</p>
<p>The holes along its body cause air to escape, changing the resonance frequency and allowing the musician to produce different frequencies (pitch).</p>
<h2 id="speech-representation">Speech Representation</h2>
<p>First, let’s introduce some terminology from phonetics.</p>
<h3 id="phonetics">Phonetics</h3>
<p>The <strong>phoneme</strong> is the unit of speech [9] and a word consists of multiple of them. They can represented in written form using IPA (International Phonetic Alphabet), which is a notation based on the Latin alphabet to represent phonemes in written form. It’s written between <code class="language-plaintext highlighter-rouge">/</code>, for example <code class="language-plaintext highlighter-rouge">/fəʊ̯n/</code> representing the pronounciation of “phone”.</p>
<p>Multiple letters can map to the same phoneme and the same letter can map to different phonemes in different words. As an example of the latter, in <code class="language-plaintext highlighter-rouge">car</code>, the <code class="language-plaintext highlighter-rouge">a</code> has the phoneme <code class="language-plaintext highlighter-rouge">/ɑ/</code>, while in <code class="language-plaintext highlighter-rouge">bat</code> it is <code class="language-plaintext highlighter-rouge">/æ/</code>.</p>
<h3 id="the-human-vocal-tract">The Human Vocal Tract</h3>
<p>The way humans produce <em>voiced sounds</em> is by exhaling air from the lungs, which then passes through the vocal cords in the larynx which cause it to vibrate, in much the same way the reed vibrates in a clarinet.</p>
<figure class="center_children">
<img src="https://www.kuniga.me/resources/blog/2021-04-03-source-filter-model/vocal-tract.jpeg" alt="The human vocal tract" />
<figcaption>Figure 6: The human vocal tract. <a href="https://www.voicescienceworks.org/vocal-tract.html">Source</a>.</figcaption>
</figure>
<p>We then have ways to transform that source signal by changing the resonating frequency of the vocal tract, via the <em>articulators</em> [9]: lips, tongue, muscles in the pharynx and the soft palate (which controls the airflow to the nasal cavity). Pure voiced sounds correspond to the pronounciation of vowels.</p>
<p>Humans can also produce <em>noise sounds</em> where the vibration does not originate at the vocal chords and correspond to the pronounciation of consontants. They can be categorized in different types depending on how they’re produced.</p>
<p><strong>Plosives</strong> are noise sounds created from a sudden release of air, like when pronouncing <code class="language-plaintext highlighter-rouge">/b/</code> or <code class="language-plaintext highlighter-rouge">/p/</code> (mnemonic: explosive). Plosives can be further categorized into <em>bilabial</em> (lips), <em>alveolar</em> (tongue) and <em>velar</em>. Each of these can be combined with voiced sounds. Here’s a full list [11]:</p>
<div class="center_children">
<table>
<thead>
<tr>
<th></th>
<th>bilabial</th>
<th>alveolar</th>
<th>palato-alveolar</th>
</tr>
</thead>
<tbody>
<tr>
<td>voiceless</td>
<td>/p/</td>
<td>/t/</td>
<td>/k/ <br /> (<b>c</b>at)</td>
</tr>
<tr>
<td>voiced</td>
<td>/b/</td>
<td>/d/</td>
<td>/g/ <br /> (<b>g</b>oat)</td>
</tr>
</tbody>
</table>
</div>
<p><strong>Fricatives</strong> are noise sounds produced by blowing air through almost closed lips or teeth to pronounce phonemes like <code class="language-plaintext highlighter-rouge">/f/</code> and <code class="language-plaintext highlighter-rouge">/s/</code> (mnemonic: friction). Fricatives can be further categorized into <em>labiodental</em> (lips + teeth), <em>dental</em>, <em>alveolar</em>, <em>palato-alveolar</em> and <em>glottal</em>.</p>
<p>It’s also possible to have a combination of voiced and noise sounds, known as <em>voiced fricatives</em>. Here’s a full list [11]:</p>
<div class="center_children">
<table>
<thead>
<tr>
<th></th>
<th>labiodental</th>
<th>dental</th>
<th>alveolar</th>
<th>palato-alveolar</th>
<th>glottal</th>
</tr>
</thead>
<tbody>
<tr>
<td>voiceless</td>
<td>/f/</td>
<td>/θ/ <br /> (<b>th</b>ick)</td>
<td>/s/</td>
<td>/ʃ/ <br /> (<b>sh</b>ow)</td>
<td>/h/ <br /> (<b>h</b>at)</td>
</tr>
<tr>
<td>voiced</td>
<td>/v/</td>
<td>/ð/ <br /> (<b>th</b>ere)</td>
<td>/z/</td>
<td>/ʒ/ <br /> (plea<b>s</b>ure)</td>
<td></td>
</tr>
</tbody>
</table>
</div>
<p><strong>Affricates</strong> are plosives followed by fricatives.</p>
<div class="center_children">
<table>
<thead>
<tr>
<th></th>
<th>palato-alveolar</th>
</tr>
</thead>
<tbody>
<tr>
<td>voiceless</td>
<td>/ʧ/ <br /> (<b>ch</b>ange)</td>
</tr>
<tr>
<td>voiced</td>
<td>/dʒ/ <br /> (<b>j</b>ob)</td>
</tr>
</tbody>
</table>
</div>
<p><strong>Nasal</strong> sounds are created when air escapes through the nose, causing the vibration to happen there. There are 3 types of nasal sounds:</p>
<div class="center_children">
<table>
<thead>
<tr>
<th>bilabial</th>
<th>alveolar</th>
<th>velar</th>
</tr>
</thead>
<tbody>
<tr>
<td>/m/</td>
<td>/n/</td>
<td>/ŋ/ <br /> (si<b>n</b>g)</td>
</tr>
</tbody>
</table>
</div>
<p><strong>Approximants</strong> are noise that sound in some ways like a vowel. A lateral approximant is when the air escapes through the side of the tongue, such as <code class="language-plaintext highlighter-rouge">/l/</code> (like). The phonemes <code class="language-plaintext highlighter-rouge">/r/</code> (right), <code class="language-plaintext highlighter-rouge">/j/</code> (yes), <code class="language-plaintext highlighter-rouge">/w/</code> (wet) are other examples.</p>
<h3 id="the-source-filter-model">The Source-Filter Model</h3>
<p>Having seen all this theory, using the tube model to represent speech seems a natural thought. The vibrating vocal chords acts as the source and the controls on the vocal tract change the resonance and act as filters.</p>
<p>The video below makes it more palpable.</p>
<div class="center_children">
<iframe width="560" height="315" src="https://www.youtube.com/embed/wR41CRbIjV4" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen=""></iframe>
</div>
<p>It contains a generator of a periodic signal with frequency 130 Hz acting as the source and then different casts modeling the sounds <code class="language-plaintext highlighter-rouge">/a/</code> and <code class="language-plaintext highlighter-rouge">/o/</code> are connected to it.</p>
<p>The simple version of the model consists of just two parameters: the source frequency and the resonance frequency of the tube / filter.</p>
<p>To model speech, we use one or more tubes for each phoneme. Voiced noises require at least two tubes, one for the voiced sound and another for the noise sound. The pronounciation of a sentence will thus be composed of multiple tubes over time.</p>
<p>The computational problem associated with a source-filter model is to determine these parameters from data.</p>
<p>This model can be used for example to efficiently encode voice. In [8] (see <em>Analysis/Resynthesis</em>) Hyung-Suk Kim is able to achieve 15x compression of speech while mostly preserving the original pitch.</p>
<h2 id="conclusion">Conclusion</h2>
<p>I love inter-disciplinary studies like these. In here we covered subjects from math, physics, biology and linguistics!</p>
<p>Even though I had learned about sound waves in high school, I realized I still had a lot to learn while writing this post. I think there are primarily 3 reasons:</p>
<ul>
<li>The internet - there’s so much content freely available in the internet, including videos and animations. Back then I only had the books and teachers as resources.</li>
<li>Intellectual curiosity - it’s one thing to be passively taught a subject in high school as opposed to genuine personal curiosity in trying to understand things much more deeply.</li>
<li>Knowledge - since high-school I learned subjects like calculus, numeric methods, fluid dynamics (sort of) and programming. Some are directly related but others help with analogies or different ways of thinking.</li>
</ul>
<h2 id="references">References</h2>
<ul>
<li>[<a href="https://en.wikipedia.org/wiki/Gunnar_Fant">1</a>] Wikipedia - Gunnar Fant</li>
<li>[<a href="https://hearinghealthfoundation.org/how-hearing-works">2</a>] Hearing Health Foundation - How Hearing Works</li>
<li>[<a href="https://www.physicsclassroom.com/class/sound/Lesson-1/Sound-as-a-Longitudinal-Wave">3</a>] The Physics Classroom - Sound as a Longitudinal Wave</li>
<li>[<a href="https://www.feynmanlectures.caltech.edu/I_47.html">4</a>] The Feynman Lectures on Physics: Sound. The wave equation</li>
<li>[<a href="https://sciencedemonstrations.fas.harvard.edu/presentations/sound-reflections-pipes">5</a>] Sound Reflections in Pipes</li>
<li>[<a href="https://www.acs.psu.edu/drussell/Demos/StandingWaves/StandingWaves.html">6</a>] Acoustics and Vibration Animations: Standing Sound Waves (Longitudinal Standing Waves)</li>
<li>[<a href="https://www.win.tue.nl/~sjoerdr/papers/boek.pdf">7</a>] An Introduction to Acoustics - S. W. Rienstra and A. Hirschberg</li>
<li>[<a href="https://ccrma.stanford.edu/~hskim08/lpc/">8</a>] Linear Predictive Coding is
All-Pole Resonance Modeling - H. Kim</li>
<li>[<a href="https://en.wikipedia.org/wiki/Phonetics">9</a>] Wikipedia - Phonetics</li>
<li>[<a href="https://ocw.mit.edu/courses/linguistics-and-philosophy">10</a>] - Linguistics Phonetics - The source filter-model of Speech Production</li>
<li>[<a href="https://www.ugr.es/~ftsaez/fonetica/consonants.pdf">11</a>] English Phonetics and Phonology - F. Trujillo</li>
</ul>Guilherme KunigamiGunnar Fant was a Swedish researcher in speech science. He received his MSc in Electrical Engineering from KHT and worked at Ericsson and MIT. In 1960 he published the source-filter model of speech production, which became widely used [1]. In this post we’ll study the source-filter model as a simplified representation of human speech.Boyer–Moore Majority Vote Algorithm2021-03-06T00:00:00+00:002021-03-06T00:00:00+00:00https://www.kuniga.me/blog/2021/03/06/boyer-moore-vote-algorithm<!-- This needs to be define as included html because variables are not inherited by Jekyll pages -->
<figure class="image_float_left">
<img src="https://www.kuniga.me/resources/blog/2021-03-06-boyer-moore-vote-algorithm/boyd-moore.jpg" alt="Robert S. Boyer and J Strother Moore thumbnail" />
</figure>
<p>Robert Stephen Boyer and J Strother Moore are <em>Professor Emeritus</em> at The University of Texas at Austin. They’re known for their <a href="https://en.wikipedia.org/wiki/Boyer%E2%80%93Moore_string-search_algorithm">Boyer–Moore string-search algorithm</a>.</p>
<p>In this post we’ll present the Boyer–Moore Majority Vote Algorithm which can find the major element (i.e. one which appears more than 50% of time) in a stream of data if one exists.</p>
<!--more-->
<h2 id="boyermoore-majority-vote-algorithm">Boyer–Moore Majority Vote Algorithm</h2>
<p>Suppose we’re given a stream (i.e. an unbounded list) of integers and we want to find the element that occurs more than the rest of the other elements together if it exists.</p>
<p>Another way to pose this problem is, given an array of $N$ elements, find the element that occurss more than $\lfloor N/2 \rfloor$ times using constant additional memory.</p>
<p>The algorithm is very simple: we keep 2 local variables, $n$ and $c$. Variable $n$ represents the current candidate for majority and $c$ is a counter that reflects how frequent $n$ is, but it’s not the frequency of $n$.</p>
<p>We start with $c = 0$ and process the elements $e_i$ of the array one at a time. If $c$ is 0, we set $n = e_i$ and $c = 1$. Otherwise, if $e_i$ equals $n$, we increment $c$, otherwise we decrement $c$.</p>
<p>At the end, if there’s a major element in the array, it’s guaranteed that it’s stored in $n$. If there is no majority, the element in $n$ will not represent anything.</p>
<p>Thus, this algorithm cannot be used to decide whether there is a major element in the array, but will find one if it has. In the array case it’s possible to verify that by doing an extra pass on the array and check if the frequency of $n$ is higher than $\lfloor N/2 \rfloor$ which can still be done with $O(1)$ memory.</p>
<p>The code in Python assuming <code class="language-plaintext highlighter-rouge">arr</code> has a major element:</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="k">def</span> <span class="nf">major</span><span class="p">(</span><span class="n">arr</span><span class="p">):</span>
<span class="n">c</span> <span class="o">=</span> <span class="mi">0</span>
<span class="k">for</span> <span class="n">e</span> <span class="ow">in</span> <span class="n">arr</span><span class="p">:</span>
<span class="k">if</span> <span class="n">c</span> <span class="o">==</span> <span class="mi">0</span><span class="p">:</span>
<span class="n">n</span> <span class="o">=</span> <span class="n">e</span>
<span class="n">c</span> <span class="o">+=</span> <span class="mi">1</span> <span class="k">if</span> <span class="n">e</span> <span class="o">==</span> <span class="n">n</span> <span class="k">else</span> <span class="o">-</span><span class="mi">1</span>
<span class="k">return</span> <span class="n">n</span></code></pre></figure>
<h2 id="a-soccer-analogy">A Soccer Analogy</h2>
<p>The interesting part of the algorithm is to understand how it works. Let’s build an intuition by considering a simple case where there’s only 2 types of values in the array, say $a$ and $b$.</p>
<p>Let’s build an analogy with soccer. We have Arsenal vs. Barcelona. The input array or stream represents the goals scored over time. An entry $a$ means that Arsenal scored, and $b$ that Barcelona scored. We now interpret $n$ as representing the team that is ahead on the score and $c$ the absolute difference between the scores. If $c$ is 0, it’s a draw and $n$ can be assumed to be undetermined.</p>
<p>It’s easy to see that the algorithm preserves these properties at every step. It’s also easy to see that the team that scored the most will be ahead at the end and thus $n$ will contain the winnner.</p>
<p>Let’s change the rules a bit now. Instead of keeping two scores, one for Arsenal and one for Barcelona, we’ll only have one but we’ll track which team is ahead. Now suppose a team scores a goal: if they’re already ahead, we increment the score; if they’re behind, they decrement the score; if it’s a tie, the score is set to 1 and they’re now ahead. Note that this is basically what the algorithm does.</p>
<p>This alternative version is less informative because while we know who won and by how much, we don’t know the absolute scores of each team. However, these rules allow us extending our soccer game to 3 teams! Let’s add Chelsea to the play. The rules are exactly the same. The team that is ahead with a positive score is the winner.</p>
<figure class="center_children">
<img src="https://www.kuniga.me/resources/blog/2021-03-06-boyer-moore-vote-algorithm/chelsearselona.png" alt="Logo of Arsenal, Barcelona and Chelsea" />
</figure>
<p>We now claim that if any team scored more goals than the other 2 combined will necessarily be the winner, although the converse is not necessarily true. To get an intuition, suppose that Arsenal scored more goals than Barcelona and Chelsea combined. Assume that somehow Barcelona and Chelsea colluded and became a single team, meaning that if Barcelona was ahead and Chelsea scored, it would increment the score instead of decrementing.</p>
<p>This essentially reduced the game to the 2-team version, in which Arsenal would still win. In the 3-team version, Arsenal could not do worse: Barcelona and Chelsea still decrement Arsenal’s score when it’s on the lead, but now they also play against each other, so Barcelona’s goals would decrement Chelsea’s score when it’s on the lead, which only helps Arsenal.</p>
<p>It easy to extend to an arbitrary number of teams. As long as one of the teams scores more goals than the rest combined, they will be declared the winners.</p>
<h2 id="lfloor-n3-rfloor-majority">$\lfloor N/3 \rfloor$ majority</h2>
<p>In our 3-play soccer match, if there’s no team that scored more than the other two combined the final result would not reflect the team that scored the most. An example is $aaabbcc$, which would have Chelsea as the winner even though Arsenal scored the most.</p>
<p>Suppose then that instead of finding the element that occurs more than $\lfloor N/2 \rfloor$, we want to find one that occurs more than $\lfloor N/3 \rfloor$ time.</p>
<p>We can keep 2 pairs of score-leader as opposed to one [2]. When we process an element $e$ we have 3 scenarios:</p>
<ol>
<li>If any of the pairs has score 0, $e$ becomes the lead of that pair with score 1.</li>
<li>If $e$ is already in either one of the pairs, we increment that one.</li>
<li>If both pairs are taken, then $e$ decreases the score of <em>both</em> of them.</li>
</ol>
<p>We claim that if an element occurs more than each of the other 2, it will be in one of the pairs. For example, imagine the array $aaabbc$. We’ll have after each step:</p>
<figure class="highlight"><pre><code class="language-text" data-lang="text">a: (a, 1), (-, 0)
a: (a, 2), (-, 0)
a: (a, 3), (-, 0)
b: (a, 3), (b, 1)
b: (a, 3), (b, 2)
c: (a, 2), (b, 1)</code></pre></figure>
<p>$a$ occurs more than $b$ and $c$, so it is in one of the pairs at the end.</p>
<p>We can ask ourselves whether the order matters. Let’s try $cbbaaa$:</p>
<figure class="highlight"><pre><code class="language-text" data-lang="text">c: (c, 1), (-, 0)
b: (c, 1), (b, 1)
b: (c, 1), (b, 2)
a: (_, 0), (b, 1)
a: (a, 1), (b, 1)
a: (a, 2), (b, 1)</code></pre></figure>
<p>We’ll call the even in which an element $x$ causes another $y$’s counter to be 0 as <em>ousting</em>. So in the second example, $a$ ousted $c$ in line 4.</p>
<p>It looks like the result doesn’t change regardless of the order of the elements. Let’s try to prove that. The idea is to define a canonical order in which the most frequent element come first as a group, then the second most and then the third. So the canonical order for $cbbaaa$ or $abcaba$ is $aaabbc$.</p>
<p>The first observation is that if $a$ and $b$ are on the lead, the order clearly doesn’t matter since they’re just incrementing their respective counter. For example: $aaabb$ and $ababa$ will both lead to $(a, 3), (b, 2)$. Furthermore, if the element that is not on the lead, say $c$, doesn’t oust anyone, their order is also irrelevant. For example $aaaabbcb$ and $aaaabbbc$ will both lead to $(a, 3), (b, 2)$ because $c$ didn’t oust anyone.</p>
<p>So the ousting is the event of interest for us. Suppose the first outsting event was when $c$ ousted $b$ at position $i$. For this to happen the number of $c$’s has to be equal to the number of $b$’s. The score looks like $(a, k_a), (-, 0)$. Now there are three possibilities for the next element in $i + 1$:</p>
<ul>
<li>$a$ - which is the element still standing - it will just increment their counter. In this case it’s clear that we could have moved $a$ before the $c$’s.</li>
<li>$b$ - this means $b$ will take its spot right away. Before $c$’s ousting we had $(a, k_a + 1), (b, 1)$ and then $(a, k_a), (-, 0)$ and now $(a, k_a), (b, 1)$. If we swapped $b$ at $i+1$ and $c$ at $i$, we would have had $(a, k_a + 1), (b, 2)$, and then $(a, k_a), (b, 1)$ which is the same result.</li>
<li>$c$ - $c$ will take $b$’s spot that is $(a, k_a), (c, 1)$. This also means at this point there were more $c$’s than $b$’s, or more precisely $k_c = k_b + 1$. If the $c$ had shown up before $b$ in the array, we’d also have $(a, k_a), (c, 1)$ and have avoided the ousting.</li>
</ul>
<p>The above analysis shows that we could re-order the elements to avoid this ousting event at position $i$. If we reason inductively, we can avoid oustings up to the last element (in which there’s no next element for swapping), in which case an ousting can happen if $k_c = k_b$ or $k_c = k_a$.</p>
<p>This shows we can assume a canonical order for any input without changes to the result. If $a$ occurs more than each of the other 2, then it’s easier to see it will stay standing at the end and with the highest score.</p>
<p><strong>More players.</strong> We can generalize this for more teams using a reasoning similar to the one for the 2-player game. If we had $k$ teams and assuming they were colluding with the non-majority team, this would reduce to the 3-player version as long as $a$’s hold the majority and appears more than $\lfloor N/3 \rfloor$ times. A formal proof is needed here though.</p>
<p>For example, suppose we have 4 teams with scores $aaaaabbbcccdd$. We could form 3 coallitions: $aaaaa$, $bbbd$ and $cccd$ in which $a$ would still win. Note that $d$ had to be split otherwise $a$ would lose majority.</p>
<p><strong>$\lfloor N/k \rfloor$ majority.</strong> We can extend the ideas above to find an element with the $k$-th majority. We define a $k-1$-array with score-leader pairs and return the leader with the highest score.</p>
<p>This can be done in $O(N)$ if we use a hash map to track the scores. Checking if $e$ is in the lead can be done in $O(1)$ and so is incrementing. Decrementing requires a pass over the entire hash map but the number of times we decrement is bounded by the number of increments we do, so it’s still $O(N)$ overall.</p>
<h2 id="misra-griess-k-reduced-bag">Misra-Gries’s k-reduced bag</h2>
<p>In [3] Misra and Gries introduce a simpler algorithm for finding the $\lfloor N/k \rfloor$ majority which is also easier to prove the correctness of. They introduce a k-reduced bag which results from removing k distinct elements from an array until it’s not possible anymore.</p>
<p>They provide an example of <code class="language-plaintext highlighter-rouge">[1, 1, 2, 3, 3]</code> and $k=2$ where we can remove <code class="language-plaintext highlighter-rouge">[1, 2]</code>, then <code class="language-plaintext highlighter-rouge">[1, 3]</code> to end with <code class="language-plaintext highlighter-rouge">[1]</code>. A k-reduced bag is not unique.</p>
<p>If the array has size $N$, we can remove $k$ distinct elements from it at most $\lfloor N/k \rfloor$ times, since after that there will be less than $k$ elements left.</p>
<p>That means that if an element appears more than $\lfloor N/k \rfloor$ times, it’s sure to be on the k-reduced bag since it’s removed at most $\lfloor N/k \rfloor$ times.</p>
<p>The other observation is that there are at most $k$ distinct elements in the k-reduced bag, by definition.</p>
<p>If we can represent the k-reduced bag as a hashmap of element-frequency we can use $O(k)$ memory.</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="k">def</span> <span class="nf">major</span><span class="p">(</span><span class="n">arr</span><span class="p">,</span> <span class="n">k</span><span class="p">):</span>
<span class="n">bag</span> <span class="o">=</span> <span class="p">{}</span>
<span class="k">for</span> <span class="n">x</span> <span class="ow">in</span> <span class="n">arr</span><span class="p">:</span>
<span class="k">if</span> <span class="n">x</span> <span class="ow">not</span> <span class="ow">in</span> <span class="n">bag</span><span class="p">:</span>
<span class="n">bag</span><span class="p">[</span><span class="n">x</span><span class="p">]</span> <span class="o">=</span> <span class="mi">1</span>
<span class="k">else</span><span class="p">:</span>
<span class="n">bag</span><span class="p">[</span><span class="n">x</span><span class="p">]</span> <span class="o">+=</span> <span class="mi">1</span>
<span class="k">if</span> <span class="nb">len</span><span class="p">(</span><span class="n">bag</span><span class="p">)</span> <span class="o">==</span> <span class="n">k</span><span class="p">:</span>
<span class="c1"># found k distinct elements.
</span> <span class="c1"># remove 1 of each
</span> <span class="n">ys</span> <span class="o">=</span> <span class="nb">list</span><span class="p">(</span><span class="n">bag</span><span class="p">.</span><span class="n">keys</span><span class="p">())</span>
<span class="k">for</span> <span class="n">y</span> <span class="ow">in</span> <span class="n">ys</span><span class="p">:</span>
<span class="n">bag</span><span class="p">[</span><span class="n">y</span><span class="p">]</span> <span class="o">-=</span> <span class="mi">1</span>
<span class="c1"># make sure hash map is O(k)
</span> <span class="k">if</span> <span class="n">bag</span><span class="p">[</span><span class="n">y</span><span class="p">]</span> <span class="o">==</span> <span class="mi">0</span><span class="p">:</span>
<span class="k">del</span> <span class="n">bag</span><span class="p">[</span><span class="n">y</span><span class="p">]</span>
<span class="c1"># returns element with the highest value
</span> <span class="k">return</span> <span class="nb">max</span><span class="p">(</span><span class="n">bag</span><span class="p">,</span> <span class="n">key</span><span class="o">=</span><span class="k">lambda</span> <span class="n">x</span><span class="p">:</span><span class="n">bag</span><span class="p">[</span><span class="n">x</span><span class="p">])</span></code></pre></figure>
<p>If we assume that adding, accessing and removing from <code class="language-plaintext highlighter-rouge">bag</code> can be done in $O(1)$, this algorithm is $O(N)$. The key is to observe that the number of times we execute the nested loop is bounded by $N$ because on each iteration we subtract from <code class="language-plaintext highlighter-rouge">bag</code> and we only add one per iteration of the outer loop.</p>
<h2 id="conclusion">Conclusion</h2>
<p>I learned about the $\lfloor N/2 \rfloor$ and $\lfloor N/3 \rfloor$ majority problems recently in programming puzzle sites. In particular, [2] led me to the <em>Boyer–Moore Majority Vote Algorithm</em> and showed a solution to the $\lfloor N/3 \rfloor$ version.</p>
<p>They don’t provide a proof of correctness of the solution but it seemed relatively obvious. However, once I tried to write a proof down I realized how tricky it is. I tried to make some analogies but the result was a sketch of a proof at best.</p>
<p>I tried reading Boyer–Moore’s original paper [4] but they only cover the $\lfloor N/2 \rfloor$ case. The paper points to Misra and Gries’ paper [3] though which was a really nice finding.</p>
<h2 id="references">References</h2>
<ul>
<li>[<a href="https://en.wikipedia.org/wiki/Boyer%E2%80%93Moore_majority_vote_algorithm">1</a>] Wikipedia: Boyer–Moore majority vote algorithm</li>
<li>[<a href="https://www.geeksforgeeks.org/n3-repeated-number-array-o1-space/">2</a>] GeeksforGeeks: N/3 repeated number in an array with O(1) space</li>
<li>[3] Finding Repeated Elements - J. Misra and David Gries</li>
<li>[4] MJRTY - A Fast Majority Vote Algorithm - R. Boyer and J Moore</li>
</ul>Guilherme KunigamiRobert Stephen Boyer and J Strother Moore are Professor Emeritus at The University of Texas at Austin. They’re known for their Boyer–Moore string-search algorithm. In this post we’ll present the Boyer–Moore Majority Vote Algorithm which can find the major element (i.e. one which appears more than 50% of time) in a stream of data if one exists.Levinson Recursion2021-02-20T00:00:00+00:002021-02-20T00:00:00+00:00https://www.kuniga.me/blog/2021/02/20/levinson-recursion<!-- This needs to be define as included html because variables are not inherited by Jekyll pages -->
<figure class="image_float_left">
<a href="https://en.wikipedia.org/wiki/File:10-08ViterbiBIG.jpg">
<img src="https://www.kuniga.me/resources/blog/2021-02-20-levinson-recursion/profile.png" alt="Norman Levinson thumbnail" />
</a>
</figure>
<p>Norman Levinson was an American mathematician, son of Russian Jewish immigrants and grew up poor. He eventually enrolled at MIT and majored in Electrical Engineering but switched to Mathematics during his PhD in no small part due to Norbert Wiener.</p>
<p>Levinson spent two years at the Institute for Advanced Study at Princeton where we has supervised by von Neumann. During the Great Depression, Levinson applied to a position at MIT and was initially refused, likely due to anti-semitic discrimination. The famous British mathematician G. H. Hardy intervened and is reported to have said to the university’s provost, Vannevar Bush:</p>
<blockquote>
<p>Tell me, Mr Bush, do you think you’re running an engineering school or a theological seminary? Is this the Massachusetts Institute of Theology? If it isn’t, why not hire Levinson.</p>
</blockquote>
<p>Levinson got the job in 1937 but almost lost it during the McCarthy era due to an initial association with the American Communist Party. Levinson passed away in 1975 [1].</p>
<p>In this post we’ll discuss the Levinson Recursion algorithm, also known as Levinson–Durbin recursion, which can be used to solve the equation $A x = y$ more efficiently if $A$ obeys some specific properties.</p>
<!--more-->
<h2 id="toeplitz-matrix">Toeplitz Matrix</h2>
<p>A Toeplitz matrix or diagonal-constant matrix is a matrix where all the elements in the top-left to bottom-right diagonals are the same.</p>
<p>In other words, let $T$ be matrix with elements $t_{ij}$. $A$ is a Toeplitz iff $t_{ij} = t_{i-1, j - 1}$ for all $i > 1$ and $j > 1$.</p>
<p>A Toeplitz matrix is not necessarily square. An example:</p>
\[\begin{bmatrix}
a & b & c & d & e \\
f & a & b & c & d \\
g & f & a & b & c \\
h & g & f & a & b \\
\end{bmatrix}\]
<p>For this post, we’ll assume $T$ is square and we’ll represent its indices generically as:</p>
\[(1) \quad
\begin{bmatrix}
t_0 & t_{-1} & t_{-2} & \dots & t_{-n + 1} \\
t_1 & t_0 & t_{-1} & \dots & t_{-n + 2} \\
t_2 & t_1 & t_0 & \dots & t_{-n + 3} \\
\vdots & \vdots & \vdots & \ddots & \vdots \\
t_{n-1} & t_{n-2} & t_{n-3} & \dots & t_0 \\
\end{bmatrix}\]
<h3 id="submatrices-along-the-diagonal">Submatrices along the diagonal</h3>
<p>Let $T$ be a $n \times n$ matrix, consider a submatrix of size $m$, $T^m$ of $T$ with $m < n$, aligned on their first element. More precisely, $t_{ij}^{m} = t_{ij}$ for $1 \le i, j \le m$.</p>
<p>It’s easy to see that if $T$ is a Toeplitz matriz, so is $T^m$.</p>
<p>Furthermore, suppose we have another submatrix $U^m$ but with the first element of $U^m$ aligned on $t_{2,2}$, that is $u_{ij}^{m} = t_{i + 1, j + 1}$ for $1 \le i, j \le m$. If $T$ is a Toeplitz matrix, then $U^m = T^m$.</p>
<p>In fact, all submatrices of a given size $m$ are the same if their first elements are aligned on the diagonal of $T$. We can get an intuition from the following example:</p>
<figure class="center_children">
<img src="https://www.kuniga.me/resources/blog/2021-02-20-levinson-recursion/toeplitz.png" alt="Toeplitz matrix with equal submatrices" />
<figcaption>Figure 1: A Toeplitz matrix and 3 submatrices marked in red. Because they have the same size and are all aligned along the same diagonal, they are equal.</figcaption>
</figure>
<p>This property will be important later, so let’s keep it in mind.</p>
<h2 id="levinson-recursion">Levinson Recursion</h2>
<p>Given a $n \times n$ Toeplitz matrix and a vector $y$, the Levinson recursion can solve equation $Tx = y$ for $x$ in $O(n^2)$ by using intermediate vectors called <em>forward</em> and <em>backward</em> vectors.</p>
<h3 id="forward-and-backward-vectors">Forward and Backward vectors</h3>
<p>Let $e_i$ be the column vector with all zeroes, except in row $i$ which is 1. We define $f^i$ such that</p>
\[T^i f^i = e_1\]
<p>and $b^i$ such that</p>
\[T^i b^i = e_i\]
<p>We can compute $f^i$ and $b^i$ recursively, that is, we assume we know $f^k$ and $b^k$ for $k < n$ and show how to obtain $f^i$ and $b^i$.</p>
<p>We start by appending a 0 to $f^{i - 1}$, which we’ll call $\hat f^i$. What happens if we multiply $T^i$ by it?</p>
\[T^i \hat f^i = T^i
\begin{bmatrix}
f^{i - 1} \\
0 \\
\end{bmatrix}\]
<p>To compute this, we can define in terms of $T^{i-1}$:</p>
\[T^i
\begin{bmatrix}
f^{i - 1} \\
0 \\
\end{bmatrix} = \begin{bmatrix}
T^{i-1} & \begin{matrix} t_{-i + 1} \\ t_{-i + 2} \\ \vdots \end{matrix} \\
\begin{matrix} t_{i-1} & t_{i-2} & \dots \end{matrix} & t_{0} \\
\end{bmatrix} \begin{bmatrix}
f^{i - 1} \\
0 \\
\end{bmatrix}\]
<p>We’ll end up with a $i \times 1$ column vector, where the first $i - 1$ entries are from</p>
\[T^{i-1} f^{i-1} + \begin{bmatrix} t_{-i+1} & t_{-i + 2} & \dots & t_{0}\end{bmatrix}^T 0 = e_1\]
<p>The last entry, a scalar, will come from</p>
\[(\sum_{j=i}^{i-1} t_{i-j} f^{i-1}_j) + t_0 0 = \sum_{j=1}^{i-1} t_{i-j} f^{i-1}_j\]
<p>Which we’ll denote as $\epsilon_f^i$, thus</p>
\[T^i
\begin{bmatrix}
f^{i - 1} \\
0 \\
\end{bmatrix} = \begin{bmatrix}
1 \\
\vdots \\
0 \\
\epsilon_f^i
\end{bmatrix}\]
<p>So we almost got $e_1$! The only extraneous bit is $\epsilon_f^i$. The idea now is to try to eliminate that. Before that, we can use the same idea to get an analogous result for $\hat b^i$, that is:</p>
\[T^i \hat b^i = T^i
\begin{bmatrix}
0 \\
b^{i - 1} \\
\end{bmatrix} =
\begin{bmatrix}
t_0 & \begin{matrix} \dots & t_{-i + 2} & t_{- n + 1} \end{matrix} \\
\begin{matrix} \vdots \\ t_{i-2} \\ t_{i-1} \end{matrix} & T^{i-1} \\
\end{bmatrix} \begin{bmatrix}
0 \\
b^{i - 1} \\
\end{bmatrix}
=
\begin{bmatrix}
\epsilon_b^i \\
0 \\
\vdots \\
1 \\
\end{bmatrix}\]
<p>Note how we’re using the property discussed in <em>Submatrices along the diagonal</em> so that</p>
\[T^i =
\begin{bmatrix}
T^{i-1} & \begin{matrix} t_{-i + 1} \\ t_{-i + 2} \\ \vdots \end{matrix} \\
\begin{matrix} t_{i-1} & t_{i-2} & \dots \end{matrix} & t_{0} \\
\end{bmatrix}
=
\begin{bmatrix}
t_0 & \begin{matrix} \dots & t_{-i + 2} & t_{-i + 1} \end{matrix} \\
\begin{matrix} \vdots \\ t_{i-2} \\ t_{i-1} \end{matrix} & T^{i-1} \\
\end{bmatrix}\]
<p>Without this property this recurrence wouldn’t work. We also assume that:</p>
\[\epsilon_b^i = \sum_{i=1}^{i-1} t_{-i} b^{i-1}_i\]
<p>We will now show that a linear combination of $\hat f^i$ and $\hat b^i$ can yield the desired results, that is</p>
\[\begin{aligned}
(2) \quad f^i = \alpha_f^i \hat f^i + \beta_f^i \hat b^i \\
(3) \quad b^i = \alpha_b^i \hat f^i + \beta_b^i \hat b^i \\
\end{aligned}\]
<p>for scalars \(\alpha_f^i, \beta_f^i, \alpha_f^i, \beta_f^i\). Since $T^i f^i = e_1$:</p>
\[T^i f^i = T^i (\alpha_f^i \hat f^i + \beta_f^i \hat b^i) = \alpha_f^i \begin{bmatrix}
1 \\
\vdots \\
0 \\
\epsilon_f^i
\end{bmatrix} + \beta_f^i \begin{bmatrix}
\epsilon_b^i \\
0 \\
\vdots \\
1 \\
\end{bmatrix} = \begin{bmatrix}
1 \\
0 \\
\vdots \\
0 \\
\end{bmatrix}\]
<p>This will gives us a set of equations, but only the first and last entries are non trivial:</p>
\[\begin{aligned}
(4) \quad & \alpha_f^i + \beta_f^i \epsilon_b^i &= 1 \\
(5) \quad & \alpha_f^i \epsilon_f^i + \beta_f^i &= 0
\end{aligned}\]
<p>Since $T^i b^i = e_i$:</p>
\[T^i b^i = T^i (\alpha_b^i \hat f^i + \beta_b^i \hat b^i) = \alpha_b^i \begin{bmatrix}
1 \\
\vdots \\
0 \\
\epsilon_f^i
\end{bmatrix} + \beta_b^i \begin{bmatrix}
\epsilon_b^i \\
0 \\
\vdots \\
1 \\
\end{bmatrix} = \begin{bmatrix}
0 \\
0 \\
\vdots \\
1 \\
\end{bmatrix}\]
\[\begin{aligned}
(6) \quad & \alpha_b^i + \beta_b^i \epsilon_b^i &= 0 \\
(7) \quad & \alpha_b^i \epsilon_f^i + \beta_b^i &= 1
\end{aligned}\]
<p>Noting that $\epsilon_f^i$ and $\epsilon_b^i$ are constants, we have 4 unknowns and 4 equations (4-7), so we can obtain:</p>
\[\begin{aligned}
\alpha_f^i &= \frac{1}{1 - \epsilon_b^i \epsilon_f^i} \\
\beta_f^i &= \frac{-\epsilon_f^i}{1 - \epsilon_b^i \epsilon_f^i} \\
\alpha_b^i &= \frac{-\epsilon_b^i}{1 - \epsilon_b^i \epsilon_f^i} \\
\beta_b^i &= \frac{1}{1 - \epsilon_b^i \epsilon_f^i} \\
\end{aligned}\]
<p>Using (2) and (3) we can obtain $f^i$ and $b^i$.</p>
<p>It remains to define the base case. We need to find $f^1$ such that $T^1 f^1 = [t_0] f^1 = [1]$, hence $f^1 = [\frac{1}{t_0}]$. The same applies to $b^1$.</p>
<h3 id="using-the-backward-vector">Using the backward vector</h3>
<p>We can solve $T x = y$ recursively, that is if we know how to solve $T^k x^k = y^k$ for $k < i$ we can solve for $T^i x^i = y^i$.</p>
<p>Let’s define $\hat x^i$ as the column vector obtained by appending 0 to $x^{i-1}$. Then</p>
\[T^i \hat x^i = T^i
\begin{bmatrix}
x^{i - 1} \\
0 \\
\end{bmatrix} = \begin{bmatrix}
T^{i-1} & \begin{matrix} t_{-i + 1} \\ t_{-i + 2} \\ \vdots \end{matrix} \\
\begin{matrix} t_{i-1} & t_{i-2} & \dots \end{matrix} & t_{0} \\
\end{bmatrix} \begin{bmatrix}
x^{i - 1}_1 \\
x^{i - 1}_2 \\
\vdots \\
0 \\
\end{bmatrix} =
\begin{bmatrix}
T^{i - 1} x^{i - 1} \\
\sum_{j = 1}^{i - 1} t_{i - j} x_i
\end{bmatrix}\]
<p>Let $\epsilon^i_x = \sum_{j = 1}^{i - 1} t_{i - j} x_i$. Then</p>
\[T^i \hat x^i = \begin{bmatrix}
y_1 \\
y_2 \\
\vdots \\
y_{i - 1} \\
\epsilon^i_x
\end{bmatrix}\]
<p>If we can replace that $\epsilon^i_x$ with $y_i$ we’d be able to obtain $x^i$! Suppose we can modify $\hat x^i$ by adding $\delta^i$ to obtain $x^i$:</p>
\[T x^i = T (\hat x^i + \delta^i) = \begin{bmatrix}
y_1 \\
y_{n - 1} \\
\vdots \\
\epsilon^i_x
\end{bmatrix} + T \delta^i = \begin{bmatrix}
y_1 \\
y_{n - 1} \\
\vdots \\
y_i
\end{bmatrix}\]
<p>So</p>
\[T^i \delta^i = \begin{bmatrix}
0 \\
0 \\
\vdots \\
y_i - \epsilon^i_x
\end{bmatrix} = (y_i - \epsilon^i_x) e_i\]
<p>Since $e^i = T^i b^b$, we have $T^i \delta^i = (y_i - \epsilon^i_x) b^i$. Wrapping up,</p>
<p>\((8) \quad x^i = \hat x^i + (y_i - \epsilon^i_x) b^i = \begin{bmatrix} x^{i-1} \\ 0 \end{bmatrix} + (y_i - \epsilon^i_x) b^i\).</p>
<p>It remains to work out the base case, $T^1 x^1 = [t_0] x^1 = [y_1]$, so $x^1 = [y_1 / t_0]$.</p>
<p>The solution we’re seeking is $x = x^i$.</p>
<h2 id="time-complexity">Time Complexity</h2>
<p>In the forward and backward vectors step, we need $O(n)$ to operations to compute $\epsilon_f^i$ and $\epsilon_b^i$, at each iteration $n$, for a total of $O(n^2)$.</p>
<p>During the final step, we need $O(n)$ to compute $\epsilon_x^i$ and also $(8)$ at each iteration $n$, for a total of $O(n^2)$.</p>
<p>So overall the algorithm has $O(n^2)$ complexity, compared to $O(n^3)$ from a Gaussian elimination for general matrices.</p>
<h2 id="python-implementation">Python Implementation</h2>
<p>We now provide an implementation of these ideas using Python.</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="k">def</span> <span class="nf">levinson</span><span class="p">(</span><span class="n">mat</span><span class="p">,</span> <span class="n">y</span><span class="p">):</span>
<span class="n">n</span> <span class="o">=</span> <span class="nb">len</span><span class="p">(</span><span class="n">mat</span><span class="p">)</span>
<span class="n">t_0</span> <span class="o">=</span> <span class="n">mat</span><span class="p">[</span><span class="mi">0</span><span class="p">][</span><span class="mi">0</span><span class="p">]</span>
<span class="c1"># forward vector f^i
</span> <span class="n">f</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">array</span><span class="p">([</span><span class="mf">1.0</span><span class="o">/</span><span class="n">t_0</span><span class="p">])</span>
<span class="c1"># backward vector b^i
</span> <span class="n">b</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">array</span><span class="p">([</span><span class="mf">1.0</span><span class="o">/</span><span class="n">t_0</span><span class="p">])</span>
<span class="c1"># partial solution x^i
</span> <span class="n">x</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">array</span><span class="p">([</span><span class="n">y</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span><span class="o">/</span><span class="n">t_0</span><span class="p">])</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="n">n</span><span class="p">):</span>
<span class="n">last_row</span> <span class="o">=</span> <span class="n">mat</span><span class="p">[</span><span class="n">i</span><span class="p">,</span> <span class="mi">0</span> <span class="p">:</span> <span class="n">i</span> <span class="o">+</span> <span class="mi">1</span><span class="p">]</span>
<span class="n">f2</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">f</span><span class="p">,</span> <span class="mi">0</span><span class="p">)</span>
<span class="n">eps_f</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">dot</span><span class="p">(</span><span class="n">last_row</span><span class="p">,</span> <span class="n">f2</span><span class="p">)</span>
<span class="n">first_row</span> <span class="o">=</span> <span class="n">mat</span><span class="p">[</span><span class="mi">0</span><span class="p">,</span> <span class="mi">0</span> <span class="p">:</span> <span class="n">i</span> <span class="o">+</span> <span class="mi">1</span><span class="p">]</span>
<span class="n">b2</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">insert</span><span class="p">(</span><span class="n">b</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">)</span>
<span class="n">eps_b</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">dot</span><span class="p">(</span><span class="n">first_row</span><span class="p">,</span> <span class="n">b2</span><span class="p">)</span>
<span class="c1"># Common denominator to all alphas and betas
</span> <span class="n">denom</span> <span class="o">=</span> <span class="mf">1.</span> <span class="o">-</span> <span class="n">eps_f</span> <span class="o">*</span> <span class="n">eps_b</span>
<span class="c1"># Compute f^i from b^(n-1) and f^(n-1)
</span> <span class="n">alpha_f</span> <span class="o">=</span> <span class="mf">1.</span> <span class="o">/</span> <span class="n">denom</span>
<span class="n">beta_f</span> <span class="o">=</span> <span class="o">-</span><span class="n">eps_f</span> <span class="o">/</span> <span class="n">denom</span>
<span class="n">f</span> <span class="o">=</span> <span class="n">alpha_f</span> <span class="o">*</span> <span class="n">f2</span> <span class="o">+</span> <span class="n">beta_f</span> <span class="o">*</span> <span class="n">b2</span>
<span class="c1"># Compute b^i from b^(n-1) and f^(n-1)
</span> <span class="n">alpha_b</span> <span class="o">=</span> <span class="o">-</span><span class="n">eps_b</span> <span class="o">/</span> <span class="n">denom</span>
<span class="n">beta_b</span> <span class="o">=</span> <span class="mi">1</span> <span class="o">/</span> <span class="n">denom</span>
<span class="n">b</span> <span class="o">=</span> <span class="n">alpha_b</span> <span class="o">*</span> <span class="n">f2</span> <span class="o">+</span> <span class="n">beta_b</span> <span class="o">*</span> <span class="n">b2</span>
<span class="c1"># Compute x^i from b^i
</span> <span class="n">x2</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="mi">0</span><span class="p">)</span>
<span class="n">eps_x</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">dot</span><span class="p">(</span><span class="n">last_row</span><span class="p">,</span> <span class="n">x2</span><span class="p">)</span>
<span class="n">x</span> <span class="o">=</span> <span class="n">x2</span> <span class="o">+</span> <span class="p">(</span><span class="n">y</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">-</span> <span class="n">eps_x</span><span class="p">)</span> <span class="o">*</span> <span class="n">b</span>
<span class="k">return</span> <span class="n">x</span></code></pre></figure>
<p>The full code is on <a href="https://github.com/kunigami/kunigami.github.io/blob/master/blog/code/2021-02-20-levinson-recursion/levinson.py">Github</a>.</p>
<p>Let’s make some observations on the code. First, we use <code class="language-plaintext highlighter-rouge">numpy</code>, especially it’s <code class="language-plaintext highlighter-rouge">dot</code> function which performs a dot product between two vectors of the same size. We also leverage the overloaded operators with <code class="language-plaintext highlighter-rouge">numpy.array</code>s which behave more like in linear algebra.</p>
<p>For example, adding two <code class="language-plaintext highlighter-rouge">numpy.array</code>s does element-wise sum:</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="n">np</span><span class="p">.</span><span class="n">array</span><span class="p">([</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">])</span> <span class="o">+</span> <span class="n">np</span><span class="p">.</span><span class="n">array</span><span class="p">([</span><span class="mi">3</span><span class="p">,</span> <span class="mi">4</span><span class="p">])</span> <span class="c1"># np.array([5, 6])</span></code></pre></figure>
<p>while Python lists concatenates. Another example is multiplying by a scalar:</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="n">np</span><span class="p">.</span><span class="n">array</span><span class="p">([</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">])</span> <span class="o">*</span> <span class="mi">10</span> <span class="c1"># np.array([10, 20])</span></code></pre></figure>
<p>Note that while we describe the computation of $b^i$ and $f^i$ vs. $x^i$ separately, in code we compute them at the same time because $x^i$ needs $b^i$.</p>
<h2 id="conclusion">Conclusion</h2>
<p>I ran into this algorithm while studying signal processing, in particular <a href="https://en.wikipedia.org/wiki/Linear_predictive_coding">Linear Predictive Coding</a> which I plan to write about later.</p>
<p>The idea is pretty clever and I was wondering how could one have thought of it. One possible way was to assume some induction property and try to jump from a solution of $k - 1$ to $k$. This is a similar line of thinking we use when trying to see if a problem can be solved by dynamic programming for example.</p>
<p>By playing around with extending the solution, we could have tried appending a 0 and observing it almost looks like what we want, except that we need to “override” the last element. From there, we can reduce to a simpler problem which is to solve $Tx = e_i$, which is exactly the backward vector equation.</p>
<p>For solving $Tb = e_i$ recursively, maybe we would realize that it’s simpler to preprend instead of appending a 0, since for $e_i$ and $i > 1$, the first position will be always zero which is easier to work with. Now we got that $\epsilon_b$ we need to eliminate, which is now in the first position, so we get to $Tf = e_1$, that is, the forward vector.</p>
<p>We then have a “cyclic dependency” since to solve $Tf = e_1$ we would need $Tb = e_i$. But maybe we can resolve that by using them at the same time? Because we’re working with $e_i$ and $e_1$ we might think of an <a href="https://en.wikipedia.org/wiki/Orthogonal_basis">orthogonal basis</a> spanning a vector space via linear combination equations like we did with equations (4-7).</p>
<p>This exercise of trying to find how an algorithm came to be was helpful to me. We often only see the final polished result of an idea but the intermediate steps are often more intuitive and more likely to be reused for solving future problems.</p>
<h2 id="related-posts">Related Posts</h2>
<p><a href="(https://www.kuniga.me/blog/2020/11/21/quantum-fourier-transform.html)">Quantum Fourier Transform</a> - The Levinson algorithm uses a technique of “manipulating” a solution with curgical precision by leveraging the $e_i$ vector, which when multiplied by a scalar can be added to the solution and only modify a single element. This reminded me of the Quantum Fourier Transform circuit where
we use the $CR_2(\psi, j_n)$ gate to “inject” a bit.</p>
<h2 id="references">References</h2>
<ul>
<li>[<a href="https://mathshistory.st-andrews.ac.uk/Biographies/Levinson/">1</a>] MacTutor - Norman Levinson</li>
<li>[<a href="https://en.wikipedia.org/wiki/Levinson_recursion">2</a>] Wikipedia - Levinson Recursion</li>
</ul>Guilherme KunigamiNorman Levinson was an American mathematician, son of Russian Jewish immigrants and grew up poor. He eventually enrolled at MIT and majored in Electrical Engineering but switched to Mathematics during his PhD in no small part due to Norbert Wiener. Levinson spent two years at the Institute for Advanced Study at Princeton where we has supervised by von Neumann. During the Great Depression, Levinson applied to a position at MIT and was initially refused, likely due to anti-semitic discrimination. The famous British mathematician G. H. Hardy intervened and is reported to have said to the university’s provost, Vannevar Bush: Tell me, Mr Bush, do you think you’re running an engineering school or a theological seminary? Is this the Massachusetts Institute of Theology? If it isn’t, why not hire Levinson. Levinson got the job in 1937 but almost lost it during the McCarthy era due to an initial association with the American Communist Party. Levinson passed away in 1975 [1]. In this post we’ll discuss the Levinson Recursion algorithm, also known as Levinson–Durbin recursion, which can be used to solve the equation $A x = y$ more efficiently if $A$ obeys some specific properties.Linux Filesystems Overview2021-02-08T00:00:00+00:002021-02-08T00:00:00+00:00https://www.kuniga.me/blog/2021/02/08/linux-filesystems-overview<!-- This needs to be define as included html because variables are not inherited by Jekyll pages -->
<p>In this post we’ll cover a diversity of topics concerning the linux filesystem. We’ll start with a high-level overview a filesystem.</p>
<p>We then cover specific concepts such as types of filesystems, files, permissions. We continue by exploring some examples of commonly layout of the linux directories hierachy.</p>
<p>This post expects some basic familiarity with linux.</p>
<!--more-->
<h2 id="the-virtual-filesystem">The Virtual Filesystem</h2>
<p>The Virtual Filesystem, aka VFS, is a global class used by the kernel to represent a given filesystem.</p>
<h3 id="the-vfs-entities">The VFS Entities</h3>
<p>There are 4 classes of objects the VFS keeps in memory.</p>
<p><strong>Superblock.</strong> In-memory object with the metadata of a specific mounted filesystem.</p>
<p><strong>Inode.</strong> In-memory object with the metadata for a specific file/directory (examples: permission, physical location - does not include filename or directory - see dentry)</p>
<p><strong>Dentry.</strong> Short for directory entry. In-memory object that associates a file/directory with a path. According to [2]: dentry objects are constructed on the fly by th VFS.</p>
<p><strong>File.</strong> Opened file associated with a process (more than one instance if multiple process are open)</p>
<p>All these objects provide an abstraction over specific implementations of the underlying filesystem (see Types of Filesystems) and are transparent to the kernel [2].</p>
<h3 id="example-diagram">Example Diagram</h3>
<p>Consider a file named log.txt located in <code class="language-plaintext highlighter-rouge">/var/tmp/</code>. The kernel will keep the following objects in memory:</p>
<figure class="center_children">
<img src="https://www.kuniga.me/resources/blog/2021-02-08-linux-filesystems-overview/diagram.png" alt="Diagram depicting the different entities of VFS" />
</figure>
<p>Things to notice:</p>
<ul>
<li>Each “piece” of the path has a corresponding dentry object, including the root <code class="language-plaintext highlighter-rouge">/</code> and the file itself.</li>
<li>The file object is not the source of truth for the file, but rather a connection (edge) between a process and the actual file.</li>
<li>The inode has a 1:1 relationship with the file and has all the metadata about it.</li>
</ul>
<h2 id="files">Files</h2>
<h3 id="the-anatomy-of-ls--l">The anatomy of <code class="language-plaintext highlighter-rouge">ls -l</code></h3>
<p>We can get a lot of information from a file by typing (<code class="language-plaintext highlighter-rouge">-l</code> stands for “long”).</p>
<figure class="highlight"><pre><code class="language-text" data-lang="text">$ls -l
-rw-rw-r-- 1 kunigami kunigami 43315 Feb 6 09:48 tmp.txt</code></pre></figure>
<p>The first character on the first column (<code class="language-plaintext highlighter-rouge">-</code>) represents the <strong>file type</strong> which we’ll see in the <em>File Types</em> section.</p>
<p>The rest of the first column (<code class="language-plaintext highlighter-rouge">rw-rw-r--</code>) displays the <strong>permission information</strong> which we’ll see in the <em>File Permissions</em> section.</p>
<p>The next block has the number of files linked to that path. Note that is doesn’t include soft links. We’ll see hard and soft links within <em>File Types > Links</em>.</p>
<p>The third and fourth columns represent the user and group owners for that file. We’ll learn about groups in the section <em>Groups</em>.</p>
<p>The last columns (<code class="language-plaintext highlighter-rouge">43315</code>, <code class="language-plaintext highlighter-rouge">Feb 6 09:48</code> and <code class="language-plaintext highlighter-rouge">tmp.txt</code>) represent the number of bytes in the file, the last modified date and its name.</p>
<h3 id="types-of-files">Types of files</h3>
<p>A lot of things are modelled as files in Linux, including directories and sockets. A common type of operation performed by the OS is transporting information from one place to another subject to some access control.</p>
<p>In Linux the concept of file abstracts such type of operations, so it can be seen as an interface which different components of the operating system can implement. Then many tools such as ls can operate over that abstraction.</p>
<p>As we saw above, we can inspect the type of a given file by reading the first character of <code class="language-plaintext highlighter-rouge">ls -l</code>:</p>
<div class="center_children">
<table>
<thead>
<tr>
<th>Symbol</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td><code class="language-plaintext">-</code></td>
<td>Regular file</td>
</tr>
<tr>
<td><code class="language-plaintext">d</code></td>
<td>Directory</td>
</tr>
<tr>
<td><code class="language-plaintext">l</code></td>
<td>Link</td>
</tr>
<tr>
<td><code class="language-plaintext">c</code></td>
<td>Character file</td>
</tr>
<tr>
<td><code class="language-plaintext">s</code></td>
<td>Socket</td>
</tr>
<tr>
<td><code class="language-plaintext">p</code></td>
<td>Named pipe</td>
</tr>
<tr>
<td><code class="language-plaintext">b</code></td>
<td>Block file</td>
</tr>
</tbody>
</table>
</div>
<p>We’ll now cover some of the types besides directories and regular files.</p>
<p><strong>Link</strong>. represents a symlink or soft link file that is simply a pointer to another file. There are actually two types of file links:</p>
<ul>
<li><strong>Soft link</strong> shares the same inode with the linked file.</li>
<li><strong>Hard link</strong> is a regular file with its own inode that points to the linked file.</li>
</ul>
<p>Only the soft link is considered a special file. We can see this by running this code:</p>
<figure class="highlight"><pre><code class="language-text" data-lang="text">$touch file.txt
# soft link
$ln -s file.txt file_sl.txt
# hard link
$ln file.txt file_hl.txt
$ls -l
-rw-rw-r-- 2 kunigami kunigami 0 Feb 3 09:38 file_hl.txt
lrwxrwxrwx 1 kunigami kunigami 8 Feb 3 09:38 file_sl.txt -> file.txt
-rw-rw-r-- 2 kunigami kunigami 0 Feb 3 09:38 file.txt</code></pre></figure>
<p><strong>Block and character files.</strong> These are hardware files and are used for reading/writing data. The major difference between block and character files as their names suggest is that the transferring of information is done block-by-block and character-by-character, respectively.</p>
<p>An example of a block file is the access to the hard disk and a character file the access to the keyboard. The <code class="language-plaintext highlighter-rouge">/dev/</code> directory has a lot of these files:</p>
<figure class="highlight"><pre><code class="language-text" data-lang="text">$ls -l /dev/sda1
brw-rw---- 1 root disk 8, 1 Dec 28 21:27 sda1</code></pre></figure>
<figure class="highlight"><pre><code class="language-text" data-lang="text">$ls -l /dev/input/mice
crw-rw---- 1 root input 13, 63 Dec 28 21:27 /dev/input/mice</code></pre></figure>
<p><strong>Named pipes.</strong> can be used to transfer data without having a backing file in disk [21]. For example, we can create a pipe using <code class="language-plaintext highlighter-rouge">mkfifo</code> and send some data to it:</p>
<figure class="highlight"><pre><code class="language-text" data-lang="text">$mkfifo my_pipe
$echo "hello world" > my_pipe</code></pre></figure>
<p>Note: this will block! Then, in a separate process (e.g. a new terminal window) we can consume that information:</p>
<figure class="highlight"><pre><code class="language-text" data-lang="text">$cat my_pipe</code></pre></figure>
<p>which will also unblock the first process. The named pipe looks like a regular file, but we can see its type:</p>
<figure class="highlight"><pre><code class="language-text" data-lang="text">$ls -l my_pipe
prw-rw-r-- 1 kunigami kunigami 0 Feb 6 10:59 my_pipe</code></pre></figure>
<p><strong>Sockets.</strong> we learned about network sockets in a <a href="https://www.kuniga.me/blog/2020/03/07/sockets.html">previous post</a> but sockets are a more general mechanism that can also be used for inter process communication. One example of a socket file is <code class="language-plaintext highlighter-rouge">/var/run/dbus/system_bus_socket</code>:</p>
<figure class="highlight"><pre><code class="language-text" data-lang="text">$ls -l /var/run/dbus/system_bus_socket
srw-rw-rw- 1 root root 0 Dec 28 21:27 /var/run/dbus/system_bus_socket</code></pre></figure>
<h3 id="file-permissions">File Permissions</h3>
<p>In our previous example <code class="language-plaintext highlighter-rouge">rw-rw-r--</code>, every 3 characters represent a level. They are the <em>user</em>, <em>group</em>, and <em>others</em> levels, respectively. <strong>user</strong> represents the permissions of the owner of the file. <strong>group</strong> refers to the permission of the group that owns the file (we’ll learn more in section <em>Groups</em> below). Finally <strong>others</strong> is regarding the permissions from everyone else.</p>
<p>Within each level, the first character represents whether <em>reading</em> is allowed, the second is about <em>writing</em>, the third about <em>executing</em>. Even though they’re displayed as letters, they’re really bits. For example, if reading is allowed, then the first bit is <code class="language-plaintext highlighter-rouge">1</code> and displayed as <code class="language-plaintext highlighter-rouge">r</code>, otherwise it’s <code class="language-plaintext highlighter-rouge">0</code> and represented as <code class="language-plaintext highlighter-rouge">-</code>.</p>
<p>For this reason, a level can be represent be alternatively represented by an octal (<code class="language-plaintext highlighter-rouge">0-7</code>), and their binary representation corresponds to the bits that are set. For example <code class="language-plaintext highlighter-rouge">rw-</code> is equivalent to <code class="language-plaintext highlighter-rouge">110</code>, which is <code class="language-plaintext highlighter-rouge">6</code> in octal. This can be handy to succinctly represent the permissions using a 3-digit octal. For example <code class="language-plaintext highlighter-rouge">rw-rw-r--</code> is <code class="language-plaintext highlighter-rouge">664</code>.</p>
<p>We can change the permissions of a file using the <code class="language-plaintext highlighter-rouge">chmod</code> command. For example:</p>
<figure class="highlight"><pre><code class="language-text" data-lang="text">$chmod 464 my_file.txt</code></pre></figure>
<p>This has the effect of changing the permissions of <code class="language-plaintext highlighter-rouge">my_file.txt</code> to <code class="language-plaintext highlighter-rouge">r--rw-r--</code>. Another common operation using <code class="language-plaintext highlighter-rouge">chmod</code> is</p>
<figure class="highlight"><pre><code class="language-text" data-lang="text">$chmod +x my_file.txt</code></pre></figure>
<p>It sets (denoted by the <code class="language-plaintext highlighter-rouge">+</code> symbol) the execution bit (<code class="language-plaintext highlighter-rouge">x</code>) to all levels. There are plenty of different ways to change permissions, which are listed in <code class="language-plaintext highlighter-rouge">man chmod</code>.</p>
<p>The permissions when creating regular files is <code class="language-plaintext highlighter-rouge">666</code> (directories is <code class="language-plaintext highlighter-rouge">777</code>). However, it can be configured by the user. It does so by applying the mask from <code class="language-plaintext highlighter-rouge">umask</code> to turn off bits. For example, if we type <code class="language-plaintext highlighter-rouge">umask</code> we see:</p>
<figure class="highlight"><pre><code class="language-text" data-lang="text">$umask
002</code></pre></figure>
<p>The last digit is the octal of <code class="language-plaintext highlighter-rouge">010</code>, which indicates that we want to turn off the second bit (<em>write</em>) of the third group (<em>others</em>). Another way to see this is by a bitwise operation. Suppose <code class="language-plaintext highlighter-rouge">p</code> is a 3-digit octal representing the initial file permission and <code class="language-plaintext highlighter-rouge">m</code> the 3-digit octal from <code class="language-plaintext highlighter-rouge">umask</code>. The resulting permission can be obtained via <code class="language-plaintext highlighter-rouge">~m & p</code>. We can do the following in Python for example:</p>
<figure class="highlight"><pre><code class="language-text" data-lang="text">>>> p = 0o666
>>> m = 0o002
>>> oct(~m & p)
'0o664'</code></pre></figure>
<h3 id="permission-groups">Permission Groups</h3>
<p>A user can belong to multiple groups but at any one time, one of the groups is the “active” one. By default the active group has the same name as the user. That’s why when running <code class="language-plaintext highlighter-rouge">ls -l</code> in the begining of this section it shows my user name twice:</p>
<figure class="highlight"><pre><code class="language-text" data-lang="text">$ls -l
-rw-rw-r-- 1 kunigami kunigami 43315 Feb 6 09:48 tmp.txt</code></pre></figure>
<p>One can see which groups they belong to by running:</p>
<figure class="highlight"><pre><code class="language-text" data-lang="text">$groups
kunigami adm cdrom sudo dip plugdev lpadmin sambashare</code></pre></figure>
<p>The first group listed is the active group. We can change the active group via <code class="language-plaintext highlighter-rouge">newgrp</code>:</p>
<figure class="highlight"><pre><code class="language-text" data-lang="text">$newgrp adm
$rm -f tmp2.txt # make sure the file won't exist
$touch tmp2.txt
$ls -l tmp2.txt
-rw-rw-r-- 1 kunigami adm</code></pre></figure>
<p>To change the owner or group of the file, we can use the commands <code class="language-plaintext highlighter-rouge">chown</code> (change owner) and <code class="language-plaintext highlighter-rouge">chgrp</code> (change group). Worth noting that <code class="language-plaintext highlighter-rouge">chown</code> can change groups too.</p>
<h3 id="special-permission-flags">Special Permission Flags</h3>
<p>There are some special bits that can be added to the file. It’s added to the end of the first column in <code class="language-plaintext highlighter-rouge">ls -l</code>, so we can tell such bits are set if there are 11 characters in it instead of the usual 10.</p>
<p><strong>The sticky bit mode (t).</strong> When this bit is set in a directory, then according to [22]:</p>
<blockquote>
<p>… a user can only change files in this directory when she is the user owner of the file or when the file has appropriate permissions. This feature is used on directories like /var/tmp, that have to be accessible for everyone, but where it is not appropriate for users to change or delete each other’s data.</p>
</blockquote>
<p>We can verify this bit in <code class="language-plaintext highlighter-rouge">/var/tmp</code>:</p>
<figure class="highlight"><pre><code class="language-text" data-lang="text">ls -ld /var/tmp
drwxrwxrwt 9 root root 4096 Jan 5 09:58 /var/tmp</code></pre></figure>
<p><strong>The SUID / SGID - bit (s).</strong> On an executable, it will run with the user and group permissions on the file instead of with those of the user issuing the command, thus giving access to system resources. On a directory (group permission only): in this special case every file created in the directory will have the same group owner as the directory itself (while normal behavior would be that new files are owned by the users who create them) [22].</p>
<h2 id="types-of-filesystems">Types of filesystems</h2>
<p>Here are some common types of filesystems</p>
<p><strong>EXT Family (Extended filesystem)</strong> - These are the usual filesystems used by Linux, and include <code class="language-plaintext highlighter-rouge">ext2</code>, <code class="language-plaintext highlighter-rouge">ext3</code> and <code class="language-plaintext highlighter-rouge">ext4</code>.</p>
<p><strong>EFI (Extensible Firmware Interface)</strong> - EFI is a more general concept which represents a partition on the hard disk used during booting. It is formatted with a filesystem that was originally based on FAT but has its own specification</p>
<p><strong>FAT (File Allocation Table)</strong> - Has variants like <code class="language-plaintext highlighter-rouge">FAT16</code>, <code class="language-plaintext highlighter-rouge">FAT32</code> - was used by DOS and early versions of Windows</p>
<p><strong>FUSE (Filesystem in Userspace)</strong> - It’s a user space filesystem, which means it can be loaded without having priviledges - it provides an interface for user-provided implementations [16].</p>
<p><strong>NTFS (New Technology filesystem)</strong> - It’s the default filesystem used by Windows.</p>
<p><strong>SquashFS</strong> - Is a read-only filesystem used to compress directories and store them - according to [9], it’s more efficient and flexible than tarball archive.</p>
<p><strong>Tmpfs</strong> - This is a special type of filesystem used by the kernel that stores data in memory but adheres to a normal filesystem interface.</p>
<p>One way to get information about the filesystem running in your Linux is via the command df. Here’s a sample of the output I get on a Ubuntu Linux machine:</p>
<figure class="highlight"><pre><code class="language-text" data-lang="text">$df -aT
Filesystem Type Mounted On
/dev/sda5 ext4 /
/dev/sda1 vfat /boot/efi
tmpfs tmpfs /dev/shm
/dev/loop5 squashfs /snap/gnome-calculator/74</code></pre></figure>
<p>Note that <code class="language-plaintext highlighter-rouge">df</code> doesn’t include the filesystem type by default (needs the option <code class="language-plaintext highlighter-rouge">-T</code>) but it has a column named <code class="language-plaintext highlighter-rouge">Filesystem</code> which has the entry <code class="language-plaintext highlighter-rouge">tmpfs</code>, which happens to also be the type (as we can see for <code class="language-plaintext highlighter-rouge">/dev/shm</code>). This can be confusing at first sight.</p>
<p>There are also a bunch of pseudo-filesystems such as <code class="language-plaintext highlighter-rouge">sysfs</code> and <code class="language-plaintext highlighter-rouge">procfs</code> which are not listed by default, so we need the <code class="language-plaintext highlighter-rouge">-a</code> (all) option. The different types of filesystems (including pseudo ones), is in the file <code class="language-plaintext highlighter-rouge">/proc/filesystems</code>.</p>
<p>Interesting to note the <code class="language-plaintext highlighter-rouge">vfat</code> type on the boot partition, since FAT is mostly associated with Windows. I haven’t found the reasoning behind but it could be so we can have dual operating systems.</p>
<p>We’ll look into what the “mounted on” represents in the next section.</p>
<h2 id="mount-points">Mount points</h2>
<p>A key observation when thinking about the directory tree is that not all subtrees live under the same disk partition. They might be stored in different partitions, different disks, external devices, memory or not be stored at all.</p>
<p>The operation that makes it possible is <strong>mounting</strong>, which basically appends an entire subtree at a specific path of the directory subtree. Maybe a name with a more vivid analogy would be <em>grafting</em>:</p>
<blockquote>
<p>a shoot or twig inserted into a slit on the trunk or stem of a living plant (…).</p>
</blockquote>
<p>As we’ll see in <em>Filesystems in the wild</em>, there is a plethora of different mounts under <code class="language-plaintext highlighter-rouge">/</code>.</p>
<h3 id="bind-mounts">Bind mounts</h3>
<p>From [5]:</p>
<blockquote>
<p>A bind mount is an alternate view of a directory tree. Classically, mounting creates a view of a storage device as a directory tree. A bind mount instead takes an existing directory tree and replicates it under a different point. The directories and files in the bind mount are the same as the original. Any modification on one side is immediately reflected on the other side, since the two views show the same data.</p>
</blockquote>
<p><strong>Bind mounds vs. symlinks.</strong> These two concepts look relatively similar, but they have two major differences:</p>
<ul>
<li>Symlinks do not work across filesystems</li>
<li>Binds do not persist any metadata to disk - it’s a in-memory/runtime abstraction</li>
</ul>
<h2 id="filesystems-in-the-wild">Filesystems in the wild</h2>
<p>In this section we present some of the directories and files under /, which are related to the filesystem in some way or are examples of different types of filesystems.</p>
<h3 id="dev">/dev/</h3>
<p>This directory corresponds to devices attached to the system.</p>
<p><code class="language-plaintext highlighter-rouge">/dev/hugepages/</code> - Hugepages is a way for the kernel to allocate pages with sizes much bigger than the default 4k. It’s a mount of the hugetlbfs (pseudo) filesystem. According to [17] Any files created under a directory mounting hugetlbfs uses huge pages</p>
<p><code class="language-plaintext highlighter-rouge">/dev/mqueue/</code> - stores message queues (used as inter-process communication, ICP) as files. It mounts a pseudo-filesystem called <code class="language-plaintext highlighter-rouge">mqueue</code>.</p>
<p><code class="language-plaintext highlighter-rouge">/dev/null</code> - is a character file which discards all the data it receives</p>
<p><code class="language-plaintext highlighter-rouge">/dev/pts</code> - is a mount of the <code class="language-plaintext highlighter-rouge">devpts</code> filesystem and is used to store character devices related to the master/slave in pseudoterminal communication [20].</p>
<p><code class="language-plaintext highlighter-rouge">/dev/random</code> - is a character file which provides pseudo-random input</p>
<p><code class="language-plaintext highlighter-rouge">/dev/shm/</code> - is a mount of <code class="language-plaintext highlighter-rouge">tmpfs</code>, used for sharing memory (hence the name <code class="language-plaintext highlighter-rouge">shm</code>) between processes [3]. Some programs might use this as a temporary directory to speed things up, since <code class="language-plaintext highlighter-rouge">tmpfs</code> is in memory filesystem. More general discussion in [4].</p>
<p><code class="language-plaintext highlighter-rouge">/dev/zero</code> - is a character file which provides <code class="language-plaintext highlighter-rouge">0</code> (<code class="language-plaintext highlighter-rouge">NULL</code> character)</p>
<h3 id="etc">/etc/</h3>
<p>This directory contains files that configures parts of the system. The name really mean etcetera and it initially hosted files which didn’t belong into any other directory [10].</p>
<p><code class="language-plaintext highlighter-rouge">/etc/fstab</code> - short for filesystem table, configures how a device (indicated by some ID) should be mounted by default.</p>
<p><code class="language-plaintext highlighter-rouge">/etc/group</code> - contains information about which groups each user belongs to. We’ll cover groups later.</p>
<h3 id="lostfound">/lost+found/</h3>
<p>Is used by <code class="language-plaintext highlighter-rouge">fsck</code> (a repair tool) to temporarily store file with corrupted metadata but which might still be useful. <code class="language-plaintext highlighter-rouge">fsck</code> can recreate the file metadata but because it doesn’t know where it was originally, it puts in there [6].</p>
<h3 id="media">/media/</h3>
<p>Directory for mounting media such as CD-ROMs and USB sticks [11]</p>
<h3 id="mnt">/mnt/</h3>
<p>Directory for mounting filesystems temporarily. Both /media/ and /mnt/ are meant for convention because a filesystem can be mounted anywhere in the tree directory, but tools and programs might make assumption on this convention.</p>
<h3 id="proc">/proc/</h3>
<p>Directory containing information about processes. It is a mount of the pseudo filesystem called <code class="language-plaintext highlighter-rouge">proc(fs)</code>. From Wikipedia [12]: The <code class="language-plaintext highlighter-rouge">proc</code> filesystem provides a method of communication between kernel space and user space. For example, the GNU version of the process reporting utility ps uses the proc filesystem to obtain its data, without using any specialized system calls.</p>
<p><code class="language-plaintext highlighter-rouge">/proc/filesystems</code> - as described earlier, this file lists the all types of filesystems supported.</p>
<h3 id="sys">/sys/</h3>
<p>Directory containing information about kernel processes. It is a mount of a pseudo filesystem called <code class="language-plaintext highlighter-rouge">sysfs</code> [13]. From [14]:</p>
<blockquote>
<p>For every kobject that is registered with the system, a directory is created for it in sysfs. That directory is created as a subdirectory of the kobject’s parent, expressing internal object hierarchies to userspace. Top-level directories in sysfs represent the common ancestors of object hierarchies; i.e. the subsystems the objects belong to.</p>
</blockquote>
<p><code class="language-plaintext highlighter-rouge">/sys/fs/cgroup</code> - <code class="language-plaintext highlighter-rouge">cgroup</code> stands for control group which is a way to organize processes hierarchically so properties like access control and limits can be configured in bulk. Not surprisingly, there’s a pseudo filesystem for that purpose, <code class="language-plaintext highlighter-rouge">cgroupfs</code>. This path doesn’t have a <code class="language-plaintext highlighter-rouge">cgroupfs</code> mount though (mine in <code class="language-plaintext highlighter-rouge">tmpfs</code>), but subdirectories like <code class="language-plaintext highlighter-rouge">/sys/fs/cgroup/pid</code> do.</p>
<p><code class="language-plaintext highlighter-rouge">/sys/fs/pstore</code> - directory for storing crash logs when the kernel panics. Upon reboot, the OS copies the contents of this directory to another place so the space can be reclaimed. According to [16], this started as a driver under <code class="language-plaintext highlighter-rouge">sysfs</code> but evolved into its own filesystem type:</p>
<blockquote>
<p>pstore moved from its original firmware driver with a sysfs interface to a more straightforward filesystem-based implementation</p>
</blockquote>
<p><code class="language-plaintext highlighter-rouge">/sys/fs/fuse/connections</code> - The default mount point of FUSE filesystems</p>
<p><code class="language-plaintext highlighter-rouge">/sys/kernel/security</code> - The default mount point of the <code class="language-plaintext highlighter-rouge">securityfs</code> filesystem, which is an in-memory pseudo file-system intended for secure application [19].</p>
<h3 id="var">/var/</h3>
<p>Variable size files (e.g. logs). Usually stored in a separate disk partition to avoid less important data affecting the main data (for example: avoiding rogue logs from filling up the disk).</p>
<h2 id="conclusion">Conclusion</h2>
<p>I originally intended to write about the Virtual Filesystem from the kernel perspective, mainly looking up to Love’s book <em>Linux Kernel Development</em> but that focuses mostly on the actual C API of the system, and I didn’t have much content for a post.</p>
<p>Instead I decided to write about filesystems from the user perspective and looking up terms and concepts I didn’t know.</p>
<p>I learned a bunch of things I didn’t know through a process of exploration and following through rabbit holes and am relatively happy with the results.</p>
<p>That’s one aspect I like the most in writing posts which is to focus on one topic and try to learn as much as possible in a bounded window of time.</p>
<p>One conclusion from the last section is that not only files are a suitable abstraction for data transportation, but filesystems are a suitable for organizing these file interfaces, as we can see with the vast array of different types of pseudo filesystems.</p>
<h2 id="related-posts">Related Posts</h2>
<ul>
<li><a href="https://www.kuniga.me/blog/2020/03/07/sockets.html">Sockets</a>. As discussed in this post, we previously talked about network sockets, which is another case of data transfer (between different machines the network) that is modeled as a file. We can see that in that post where it mentions <code class="language-plaintext highlighter-rouge">socket()</code> returns a file descriptor.</li>
</ul>
<h2 id="references">References</h2>
<ul>
<li>[<a href="https://tldp.org/LDP/intro-linux/html/chap_03.html">1</a>] Introduction to Linux: Chapter 3. About files and the file system</li>
<li>[2] Linux Kernel Development, by Robert Love</li>
<li>[<a href="https://www.kernel.org/doc/gorman/html/understand/understand015.html">3</a>] Chapter 12 Shared Memory Virtual Filesystem</li>
<li>[<a href="https://superuser.com/a/1030777/43534">4</a>] Superuser: When should I use /dev/shm/ and when should I use /tmp/?</li>
<li>[<a href="https://unix.stackexchange.com/a/198591/3632">5</a>] Unix & Linux: What is a bind mount?</li>
<li>[<a href="https://unix.stackexchange.com/a/18157/3632">6</a>] Unix & Linux: What is the purpose of the lost+found folder in Linux and Unix?</li>
<li>[<a href="https://unix.stackexchange.com/a/4403/3632">7</a>] Unix & Linux: What is a Superblock, Inode, Dentry and a File?</li>
<li>[<a href="https://unix.stackexchange.com/a/270536/3632">8</a>] Unix & Linux: How could Linux use ‘sda’ device file when it hasn’t been installed?</li>
<li>[<a href="https://tldp.org/HOWTO/SquashFS-HOWTO/whatis.html">9</a>] What is SquashFS</li>
<li>[<a href="https://www.linuxnix.com/linux-directory-structure-explainedetc-folder/">10</a>] Linux Directory Structure: /etc Explained</li>
<li>[<a href="https://en.wikipedia.org/wiki/Filesystem_Hierarchy_Standard">11</a>] Wikipedia: Filesystem Hierarchy Standard</li>
<li>[<a href="https://en.wikipedia.org/wiki/Procfs">12</a>] Wikipedia: procfs</li>
<li>[<a href="https://en.wikipedia.org/wiki/Sysfs">13</a>] Wikipedia: sysfs</li>
<li>[<a href="https://www.kernel.org/doc/Documentation/filesystems/sysfs.txt">14</a>] sysfs - The filesystem for exporting kernel objects</li>
<li>[<a href="https://lwn.net/Articles/434821/">15</a>] LWN.net: Persistent storage for a kernel’s “dying breath”</li>
<li>[<a href="https://github.com/libfuse/libfuse">16</a>] Github - libfuse/libfuse</li>
<li>[<a href="https://www.kernel.org/doc/Documentation/vm/hugetlbpage.txt">17</a>] The Linux Kernal Archives: hugetlbpage</li>
<li><a href="https://man7.org/linux/man-pages/man7/cgroups.7.html">[18]</a> cgroups(7) — Linux manual page</li>
<li>[<a href="https://lwn.net/Articles/153366/">19</a>] LWN.net: securityfs</li>
<li>[<a href="https://en.wikipedia.org/wiki/Devpts">20</a>] Wikipedia: devpts</li>
<li>[<a href="https://en.wikipedia.org/wiki/Named_pipe">21</a>] Wikipedia: Named pipe</li>
<li>[<a href="https://www.linuxtopia.org/online_books/introduction_to_linux/linux_Special_modes.html">22</a>] Linuxtopia - 3.4.2.5. Special modes</li>
</ul>Guilherme KunigamiIn this post we’ll cover a diversity of topics concerning the linux filesystem. We’ll start with a high-level overview a filesystem. We then cover specific concepts such as types of filesystems, files, permissions. We continue by exploring some examples of commonly layout of the linux directories hierachy. This post expects some basic familiarity with linux.Viterbi Algorithm2021-01-25T00:00:00+00:002021-01-25T00:00:00+00:00https://www.kuniga.me/blog/2021/01/25/viterbi-algorithm<!-- This needs to be define as included html because variables are not inherited by Jekyll pages -->
<figure class="image_float_left">
<a href="https://en.wikipedia.org/wiki/File:10-08ViterbiBIG.jpg">
<img src="https://www.kuniga.me/resources/blog/2021-01-25-viterbi-algorithm/viterbi-profile.png" alt="Andrew Viterbi thumbnail" />
</a>
</figure>
<p>Andrew Viterbi is an Italian-born American electrical engineer who co-founded Qualcomm and is also known for the Viterbi algorithm, although the method has been independently discovered by other people [1].</p>
<p>His family fled Italy due to Mussolini’s fascist policies which targeted Italy’s Jewish population [2]. He obtained his BS and MS from MIT and PhD from USC (University of Southern California), whose School of Engineering is named after Andrew and his late wife Erna.</p>
<p>In this post we’ll discuss the Viterbi algorithm in the context of Hidden Markov Models</p>
<!--more-->
<h2 id="hidden-markov-model">Hidden Markov Model</h2>
<p>We can think of a Hidden Markov Model as a probabilistic state machine. We are given a graph with nodes corresponding to hidden (or latent) states and edges corresponding to transitions between states, associated with a probability of going from one state to another.</p>
<p>Associated with each node is a set of possible visible states, or observations, which act as a proxy to the hidden state. There’s a probability of observing a visible state given a hidden state.</p>
<p>Let’s go into more details and introduce more formal notation.</p>
<p><strong>States.</strong> There are $N$ (hidden) states in the model, $S = \curly{s_1, s_2, \cdots, s_N}$.</p>
<p><strong>Time</strong>. As in a state machine, we have the concept of time. At any given instant $t$, we are in a state $q_t$. We start at $t = 0$ at any given state $q_0 = s_i$ with probability $\pi_i$ for $i \in \curly{1, 2, \cdots, N}$.</p>
<p><strong>Transition.</strong> When going from instant $t$ to $t+1$, we make a state transition. The probability of going to state $s_j$ given we’re at a state $s_i$ is given by the variable $a_{ij}$, where</p>
\[\sum_{j = 1}^{N} a_{ij} = 1 \qquad \forall i \in \curly{1, 2, \cdots, N}\]
<p>Note that we may end up staying on the same state if $a_{ii} > 0$.</p>
<p>The $N \times N$ matrix of $a_{ij}$ is denoted by $A$. The “Markov” in “Hidden Markov Model” stems from the fact that the probability of transition only depends on the current state, that is, independent from the past like in a <a href="https://en.wikipedia.org/wiki/Markov_chain">Markov chain</a>.</p>
<p><strong>Observation.</strong> The “Hidden” in “Hidden Markov Model” models the fact that it might be hard or impossible to directly measure or observe a state $s_i$, and instead we use a proxy metric or observation that is ideally highly correlated with $s_i$.</p>
<p>There are $M$ observable states in the model, $V = \curly{v_1, v_2, \cdots, v_M}$. The probability of observing $v_k$ given we’re in state $s_i$, $p(v_k \mid s_i)$, is encoded in the variable $b_{ik}$. The $N \times M$ matrix of $b_{ik}$ is denoted by $B$.</p>
<h2 id="estimating-the-hidden-states">Estimating the Hidden States</h2>
<p>Suppose we have a sequence of observations $O = o_1, o_2, \cdots, o_T$, where $o_i \in \curly{v_1, \cdots v_M}$ and a model with all the variables $(S, V, A, B, \pi)$ defined. We want to estimate the sequence of hidden states most likely to have yielded such observations.</p>
<p>More formally, let $Q = q_1, q_2, \cdots, q_T$ be a sequence of states where $q_i \in \curly{s_1, \cdots s_N}$. The probability of observing $O$ given $Q$, $p(O \mid Q)$ can be computed by multiplying the probabilities of the independent events. First we consider the probability we started on $q_1$, $\pi_{q_1}$. Secondly, the probability we measured each $o_i$ at state $q_i$, that is $b_{q_i,o_i}$. Thirdly, the probability we transitioned from $q_i$ to $q_{i+1}$, that is $a_{q_i,q_{i+1}}$. Putting it all together:</p>
\[p(O \mid Q) = \pi_{q_1} (\prod_{i = 1}^{t - 1} a_{q_i,q_{i+1}}) (\prod_{i = 1}^{t} b_{q_i,o_{i}})\]
<p>We want to find $Q$ such that $p(O \mid Q)$ is maximized.</p>
<h2 id="viterbi-algorithm">Viterbi Algorithm</h2>
<p>The Viterby Algorithm is a dynamic programming that solves the problem above.</p>
<p>We introduce a $T \times N$ memoization matrix $P^{*} = \curly{p_{ij}}$, where $p_{ij}$ represents the maximum probability of the subproblem for $O’ = o_1, o_2, \cdots, o_i$, corresponding to the optimal solution $\hat Q = \hat q_1, \hat q_2 \cdots, \hat q_i$ and assuming the last state is $s_j$, that is $\hat q_i = j$.</p>
<p>An accompanying matrix $L^{*} = {l_{ij}}$ stores the second-to-last element of $\hat Q$, that is $\hat q_{i-1}$.</p>
<h3 id="recurrence">Recurrence</h3>
<p>If we assume we know how to solve the problem for the first $i - 1$ states, we can solve it for $i$. In other words, we know $p_{(i - 1)k}$ for all $k \in \curly{1, \cdots, N}$.</p>
<p>Then $p_{ij}$ will consist of finding the state $s_k$ from which we’ll transition to state $s_j$ such that the probability of $p_{(i - 1)k} \cdot a_{kj} \cdot b_{j, o_{i}}$. Note that $b_{j, o_{i}}$ does not depend on our choice of $k$, so we can write the recurrence as:</p>
\[p_{ij} = \max_{k = 1}^N(p_{i - 1, k} \cdot a_{kj}) \cdot b_{j, o_{i}}\]
<p>Then $l_{ij}$ simply stores that choice of $k$:</p>
\[l_{ij} = \mbox{argmax}^N_{k = 1}(p_{i - 1, k} \cdot a_{kj})\]
<p>Note we could compute $p_{ij}$ from $l_{ij}$:</p>
\[p_{ij} = p_{i - 1, l_{ij}} \cdot a_{l_{ij}, j} \cdot b_{j, o_{i}}\]
<h3 id="initialization">Initialization</h3>
<p>The base case is when $i = 1$, for which we can also account for the initial probability $\pi$, that is:</p>
\[\begin{aligned}
p_{1j} &= \pi_j \cdot b_{j, o_1}\\
l_{1j} &= 0
\end{aligned}\]
<p>Where 0 in the second equation represent an invalid index, since we are assuming our indexes start at 1.</p>
<h3 id="complexity">Complexity</h3>
<p>For each entry of the $T \times N$ matrix we need to go over $N$ entries to find $k$, leading to a $O(T N^2)$ algorithm.</p>
<p>Note that the recurrence only depends on the last time instant $i - 1$, so we don’t need to store the previous entries of $P^{*}$, which can be replaced with two $N$ vectors, but we do need to keep the matrix $L^{*}$ to be able to retrieve the solution.</p>
<h3 id="python-implementation">Python Implementation</h3>
<p>Implementing these ideas in Python is relatively straightforward. The code below replaces the one-letter variables with some more readable ones and bundles the model variables in a class.</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="k">def</span> <span class="nf">viterbi</span><span class="p">(</span><span class="n">model</span><span class="p">,</span> <span class="n">obs</span><span class="p">):</span>
<span class="n">n</span> <span class="o">=</span> <span class="nb">len</span><span class="p">(</span><span class="n">model</span><span class="p">.</span><span class="n">hidden_sts</span><span class="p">)</span>
<span class="n">m</span> <span class="o">=</span> <span class="nb">len</span><span class="p">(</span><span class="n">model</span><span class="p">.</span><span class="n">visible_sts</span><span class="p">)</span>
<span class="n">t</span> <span class="o">=</span> <span class="nb">len</span><span class="p">(</span><span class="n">obs</span><span class="p">)</span>
<span class="n">memo_prob</span> <span class="o">=</span> <span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">*</span> <span class="n">n</span>
<span class="n">memo_sol</span> <span class="o">=</span> <span class="p">[[</span><span class="bp">None</span><span class="p">]</span> <span class="o">*</span> <span class="n">n</span> <span class="k">for</span> <span class="n">x</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">t</span><span class="p">)]</span>
<span class="c1"># Handle base case
</span> <span class="k">for</span> <span class="n">j</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">n</span><span class="p">):</span>
<span class="n">obs_prob</span> <span class="o">=</span> <span class="n">model</span><span class="p">.</span><span class="n">obs_prob</span><span class="p">[</span><span class="n">j</span><span class="p">][</span><span class="n">obs</span><span class="p">[</span><span class="mi">0</span><span class="p">]]</span>
<span class="n">memo_prob</span><span class="p">[</span><span class="n">j</span><span class="p">]</span> <span class="o">=</span> <span class="n">model</span><span class="p">.</span><span class="n">ini_prob</span><span class="p">[</span><span class="n">j</span><span class="p">]</span> <span class="o">*</span> <span class="n">obs_prob</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="n">t</span><span class="p">):</span>
<span class="n">new_memo_prob</span> <span class="o">=</span> <span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">*</span> <span class="n">n</span>
<span class="k">for</span> <span class="n">j</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">n</span><span class="p">):</span>
<span class="n">max_prob</span> <span class="o">=</span> <span class="mi">0</span>
<span class="n">best_k</span> <span class="o">=</span> <span class="bp">None</span>
<span class="k">for</span> <span class="n">k</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">n</span><span class="p">):</span>
<span class="n">prob</span> <span class="o">=</span> <span class="n">memo_prob</span><span class="p">[</span><span class="n">k</span><span class="p">]</span> <span class="o">*</span> <span class="n">model</span><span class="p">.</span><span class="n">trans_prob</span><span class="p">[</span><span class="n">k</span><span class="p">][</span><span class="n">j</span><span class="p">]</span>
<span class="k">if</span> <span class="n">prob</span> <span class="o">></span> <span class="n">max_prob</span><span class="p">:</span>
<span class="n">max_prob</span> <span class="o">=</span> <span class="n">prob</span>
<span class="n">best_k</span> <span class="o">=</span> <span class="n">k</span>
<span class="n">obs_prob</span> <span class="o">=</span> <span class="n">model</span><span class="p">.</span><span class="n">obs_prob</span><span class="p">[</span><span class="n">j</span><span class="p">][</span><span class="n">obs</span><span class="p">[</span><span class="n">i</span><span class="p">]]</span>
<span class="n">new_memo_prob</span><span class="p">[</span><span class="n">j</span><span class="p">]</span> <span class="o">=</span> <span class="n">max_prob</span> <span class="o">*</span> <span class="n">obs_prob</span>
<span class="n">memo_sol</span><span class="p">[</span><span class="n">i</span><span class="p">][</span><span class="n">j</span><span class="p">]</span> <span class="o">=</span> <span class="n">best_k</span>
<span class="n">memo_prob</span> <span class="o">=</span> <span class="n">new_memo_prob</span>
<span class="c1"># Recover solution
</span> <span class="n">curr_st</span> <span class="o">=</span> <span class="mi">0</span>
<span class="k">for</span> <span class="n">j</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="n">n</span><span class="p">):</span>
<span class="k">if</span> <span class="n">memo_prob</span><span class="p">[</span><span class="n">curr_st</span><span class="p">]</span> <span class="o"><</span> <span class="n">memo_prob</span><span class="p">[</span><span class="n">j</span><span class="p">]:</span>
<span class="n">curr_st</span> <span class="o">=</span> <span class="n">j</span>
<span class="n">sol</span> <span class="o">=</span> <span class="p">[</span><span class="n">curr_st</span><span class="p">]</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">t</span> <span class="o">-</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="o">-</span><span class="mi">1</span><span class="p">):</span>
<span class="n">curr_st</span> <span class="o">=</span> <span class="n">memo_sol</span><span class="p">[</span><span class="n">i</span><span class="p">][</span><span class="n">curr_st</span><span class="p">]</span>
<span class="n">sol</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">curr_st</span><span class="p">)</span>
<span class="c1"># Need to reverse since we backtracked
</span> <span class="n">sol</span> <span class="o">=</span> <span class="n">sol</span><span class="p">[::</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span>
<span class="k">return</span> <span class="p">[</span><span class="n">model</span><span class="p">.</span><span class="n">hidden_sts</span><span class="p">[</span><span class="n">x</span><span class="p">]</span> <span class="k">for</span> <span class="n">x</span> <span class="ow">in</span> <span class="n">sol</span><span class="p">]</span></code></pre></figure>
<h3 id="example">Example</h3>
<p>Wikipedia has a nice toy example for testing our code. Quoting it [1]:</p>
<blockquote>
<p>Consider a village where all villagers are either healthy or have a fever and only the village doctor can determine whether each has a fever. The doctor diagnoses fever by asking patients how they feel. The villagers may only answer that they feel normal, dizzy, or cold.</p>
</blockquote>
<blockquote>
<p>The doctor believes that the health condition of his patients operates as a discrete Markov chain. There are two states, “Healthy” and “Fever”, but the doctor cannot observe them directly; they are hidden from him. On each day, there is a certain chance that the patient will tell the doctor he is “normal”, “cold”, or “dizzy”, depending on their health condition.</p>
</blockquote>
<p>It then provides the following parameters for the model:</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="n">model</span> <span class="o">=</span> <span class="n">HMM</span><span class="p">(</span>
<span class="n">hidden_sts</span><span class="o">=</span><span class="p">[</span><span class="s">"healthy"</span><span class="p">,</span> <span class="s">"fever"</span><span class="p">],</span>
<span class="n">visible_sts</span><span class="o">=</span><span class="p">[</span><span class="s">"normal"</span><span class="p">,</span> <span class="s">"cold"</span><span class="p">,</span> <span class="s">"dizzy"</span><span class="p">],</span>
<span class="n">trans_prob</span><span class="o">=</span><span class="p">[[</span><span class="mf">0.7</span><span class="p">,</span> <span class="mf">0.3</span><span class="p">],</span> <span class="p">[</span><span class="mf">0.4</span><span class="p">,</span> <span class="mf">0.6</span><span class="p">]],</span>
<span class="n">obs_prob</span><span class="o">=</span><span class="p">[[</span><span class="mf">0.5</span><span class="p">,</span> <span class="mf">0.4</span><span class="p">,</span> <span class="mf">0.1</span><span class="p">],</span> <span class="p">[</span><span class="mf">0.1</span><span class="p">,</span> <span class="mf">0.3</span><span class="p">,</span> <span class="mf">0.6</span><span class="p">]],</span>
<span class="n">ini_prob</span><span class="o">=</span><span class="p">[</span><span class="mf">0.6</span><span class="p">,</span> <span class="mf">0.4</span><span class="p">],</span>
<span class="p">)</span></code></pre></figure>
<p>which is depicted in the diagram below:</p>
<figure class="center_children">
<img src="https://www.kuniga.me/resources/blog/2021-01-25-viterbi-algorithm/hmm_example.png" alt="A diagram depicting the HMM corresponding to the example above" />
<figcaption>Figure 1: HMM corresponding to the example above</figcaption>
</figure>
<p>We can run our algorithm with the sample observations:</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="n">obs</span> <span class="o">=</span> <span class="p">[</span><span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">]</span>
<span class="k">print</span><span class="p">(</span><span class="n">viterbi</span><span class="p">(</span><span class="n">model</span><span class="p">,</span> <span class="n">obs</span><span class="p">))</span></code></pre></figure>
<p>To get</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="p">[</span><span class="s">'healthy'</span><span class="p">,</span> <span class="s">'healthy'</span><span class="p">,</span> <span class="s">'fever'</span><span class="p">]</span></code></pre></figure>
<h2 id="conclusion">Conclusion</h2>
<p>I learned about Viterbi’s Algorithm while studing HMMs in the context of speech recognition [3], which is my main topic of study this year.</p>
<p>The Wikipedia entry for this subject [1] is very good, so I was able to base my post entired off that. I found that the algorithm itself is not very complicated.</p>
<h2 id="related-posts">Related Posts</h2>
<ul>
<li><a href="https://www.kuniga.me/blog/2012/03/18/pgm-lecture-notes-week-1.html">Probabilistic Graphical Model Lecture Notes - Week 1</a>. I’ve learned about HMMs a long time ago, and forgot most of it. It might be that I even learned about Viterbi’s algorithm at some point too. I’vew revisited and translated this post from my old blog in Portuguese.</li>
</ul>
<h2 id="references">References</h2>
<ul>
<li>[<a href="https://en.wikipedia.org/wiki/Viterbi_algorithm">1</a>] Wikipedia - Viterbi algorithm</li>
<li>[<a href="https://viterbischool.usc.edu/about-andrew-viterbi/">2</a>] USC - About Andrew Viterbi</li>
<li>[3] A tutorial on Hidden Markov Models and Selected Applications in Speech Recognition, Rabiner, L.</li>
</ul>Guilherme KunigamiAndrew Viterbi is an Italian-born American electrical engineer who co-founded Qualcomm and is also known for the Viterbi algorithm, although the method has been independently discovered by other people [1]. His family fled Italy due to Mussolini’s fascist policies which targeted Italy’s Jewish population [2]. He obtained his BS and MS from MIT and PhD from USC (University of Southern California), whose School of Engineering is named after Andrew and his late wife Erna. In this post we’ll discuss the Viterbi algorithm in the context of Hidden Markov ModelsMax Area Under a Histogram2021-01-09T00:00:00+00:002021-01-09T00:00:00+00:00https://www.kuniga.me/blog/2021/01/09/max-area-under-histogram<!-- This needs to be define as included html because variables are not inherited by Jekyll pages -->
<p>This is a classic programming puzzle. Suppose we’re given an array of $n$ values reprenting the height $h_i \ge 0$ of bars in a histogram, for $i = 0, \cdots, n-1$. We want to find the largest rectangle that fits “inside” that histogram.</p>
<p>More formally, we’d like to find indices $l$ and $r$, $l \le r$ such that $(l - r + 1) \bar h$ is maximum, where $\bar h$ is the smallest $h_i$ for $i \in \curly{l, \cdots, r}$.</p>
<p>In this post we’ll describe an $O(n)$ algorithm that solves this problem.</p>
<!--more-->
<h2 id="simple-on2-solution">Simple $O(n^2)$ Solution</h2>
<p>The first observation we make is that any optimal rectangle must “touch” the top of one of the bars. If that wasn’t true, we could extend it a bit further and get a bigger rectangle.</p>
<p>This means we can consider each bar $i$ in turn and check the largest rectangle that “touches” the top of that bar, that is, has height $h_i$. We start with the rectangle as the bar itself, then we expand the width towards the left and right. How far can we go?</p>
<p>It’s easy to visualize we can keep expanding until we find a bar whose height is less than $h_i$. This gives us an $O(n^2)$ algorithm: for each $i$, find the closest $l < i$ whose height is less than $h_i$ and the closest $r > i$ whose height is less than $h_i$.</p>
<figure class="center_children">
<img src="https://www.kuniga.me/resources/blog/2021-01-09-max-area-under-histogram/example_1.png" alt="a diagram depicting a quantum circuit" />
<figcaption>Figure 1: Finding the left and right boundaries of the highlighted bar</figcaption>
</figure>
<p>We can assume that the first and last elements of the height array are sentinels with height -1, so we don’t have to worry about corner cases since there are always such $l$ and $r$.</p>
<p>The maximum area will be $(r - l - 1)*h_i$. For illustration purposes we provide the Python code:</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="k">def</span> <span class="nf">get_max_area_hist</span><span class="p">(</span><span class="n">h</span><span class="p">):</span>
<span class="n">max_a</span> <span class="o">=</span> <span class="mi">0</span>
<span class="n">h</span> <span class="o">=</span> <span class="p">[</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span> <span class="o">+</span> <span class="n">h</span> <span class="o">+</span> <span class="p">[</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span> <span class="c1"># sentinels
</span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="nb">len</span><span class="p">(</span><span class="n">n</span><span class="p">)</span> <span class="o">-</span> <span class="mi">1</span><span class="p">):</span>
<span class="n">l</span> <span class="o">=</span> <span class="n">i</span> <span class="o">-</span> <span class="mi">1</span>
<span class="k">while</span> <span class="n">h</span><span class="p">[</span><span class="n">l</span><span class="p">]</span> <span class="o">>=</span> <span class="n">h</span><span class="p">[</span><span class="n">i</span><span class="p">]:</span>
<span class="n">l</span> <span class="o">-=</span> <span class="mi">1</span>
<span class="n">r</span> <span class="o">=</span> <span class="n">i</span> <span class="o">+</span> <span class="mi">1</span>
<span class="k">while</span> <span class="n">h</span><span class="p">[</span><span class="n">r</span><span class="p">]</span> <span class="o">>=</span> <span class="n">h</span><span class="p">[</span><span class="n">i</span><span class="p">]:</span>
<span class="n">r</span> <span class="o">+=</span> <span class="mi">1</span>
<span class="n">max_a</span> <span class="o">=</span> <span class="nb">max</span><span class="p">(</span><span class="n">max_a</span><span class="p">,</span> <span class="n">h</span><span class="p">[</span><span class="n">i</span><span class="p">]</span><span class="o">*</span><span class="p">(</span><span class="n">r</span> <span class="o">-</span> <span class="n">l</span> <span class="o">-</span> <span class="mi">1</span><span class="p">))</span>
<span class="k">return</span> <span class="n">max_a</span></code></pre></figure>
<h2 id="on-solution-with-a-stack">$O(n)$ Solution with a Stack</h2>
<p>We can get rid of the inner loops by using a stack $S$. The stack will contain the indexes of the bars and starts out containing the sentinel element at 0 ($S = [0]$). At iteration $i$, we pop all elements from the stack that are greater than $h_i$ and then add $i$. Let’s represent the elements of the stack as $S = [a_0, a_1, \cdots, a_m]$, where $a_m$ is the top of the stack. Let’s explore a few properties:</p>
<p><strong>Property 1.</strong> The heights corresponding to the indices in the stack are sorted in non-decreasing order after each iteration. That is $h_{a_0} \le h_{a_1} \le \cdots, h_{a_m}$.</p>
<p><strong>Proof.</strong> We can show this by induction. It’s clearly true for a single element. Now suppose the property holds at the beginning of iteration $i$. Before we insert $i$ at the top, we’ll remove all the indices whose heights are bigger than $h_i$. Let the resulting stack be $S = [a_0, a_1, \cdots, a_{m’}]$. By hypothesis $h_{a_0} \le h_{a_1} \le \cdots, h_{a_{m’}}$ and by constuction $h_i \ge h_{a_{m’}}$, so the property holds after the insertion of $i$. <em>QED</em>.</p>
<p><strong>Property 2.</strong> For a given index $a_i$ in the stack ($i > 0$), $a_{i - 1}$ is the closest $l < a_{i}$ whose height is less than $h_{a_{i}}$.</p>
<p><em>Proof.</em> Let $j$ be the index stored at the top of the stack right before inserting $i$. We want to show $j = l$. We first note that by construction $h_j < h_i$ and $j < i$. Suppose $j \neq l$. Then $j < l$ by the definition of $l$. So if $l$ is not at the top of the stack, it got popped out since it was added, but it can only be popped at iteration $l’ > l$ whose height is smaller than $h_l$, which is a contradiction, since $l’$ would be closer to $i$ and $h_{l’} < h_i$.</p>
<p>This holds as long as both $i$ and $l$ remain on the stack since their order never change. <em>QED</em>.</p>
<p><strong>Property 3.</strong> If at iteration $i$ the index $j$ is popped from the stack, then $i$ is the closest $r > j$ whose height is less than $h_j$.</p>
<p><em>Proof.</em> We know that $h_j > h_i$ because it was popped and $i > j$ by construction. It remains to show that $i$ is the closest index to $j$. Suppose it’s not, that there is $i > i’ > j$ such that $h_j > h_{i’}$. Since by <em>Property 1</em> the heights in the stack are always in non-decreasing order, at iteration $i’$ it would have caused all elements on top of $j$ in $S$ to the popped and then $j$, but since $j$ is still in the stack, this cannot be so. <em>QED</em>.</p>
<p>Concluding, by <em>Property 3</em>, if $j$ is popped out in iteration $i$, then $r = i$. Moreover once $j$ is popped, the top stack happens to be $l$ by <em>Property 2</em>.</p>
<p>This leads to this algorithm:</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="k">def</span> <span class="nf">get_max_area_hist</span><span class="p">(</span><span class="n">h</span><span class="p">):</span>
<span class="n">max_a</span> <span class="o">=</span> <span class="mi">0</span>
<span class="n">h</span> <span class="o">=</span> <span class="p">[</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span> <span class="o">+</span> <span class="n">h</span> <span class="o">+</span> <span class="p">[</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span> <span class="c1"># sentinels
</span> <span class="n">stack</span> <span class="o">=</span> <span class="p">[</span><span class="mi">0</span><span class="p">]</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="nb">len</span><span class="p">(</span><span class="n">h</span><span class="p">)):</span>
<span class="k">while</span> <span class="n">h</span><span class="p">[</span><span class="n">stack</span><span class="p">[</span><span class="o">-</span><span class="mi">1</span><span class="p">]]</span> <span class="o">></span> <span class="n">h</span><span class="p">[</span><span class="n">i</span><span class="p">]:</span>
<span class="n">j</span> <span class="o">=</span> <span class="n">stack</span><span class="p">.</span><span class="n">pop</span><span class="p">()</span>
<span class="n">l</span><span class="p">,</span> <span class="n">r</span> <span class="o">=</span> <span class="n">i</span><span class="p">,</span> <span class="n">stack</span><span class="p">[</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span>
<span class="n">max_a</span> <span class="o">=</span> <span class="nb">max</span><span class="p">(</span><span class="n">max_a</span><span class="p">,</span> <span class="n">h</span><span class="p">[</span><span class="n">j</span><span class="p">]</span><span class="o">*</span><span class="p">(</span><span class="n">l</span> <span class="o">-</span> <span class="n">r</span> <span class="o">-</span> <span class="mi">1</span><span class="p">))</span>
<span class="n">stack</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">i</span><span class="p">)</span>
<span class="k">return</span> <span class="n">max_a</span></code></pre></figure>
<p>We still have an inner loop but we can argue that the amortized cost is $O(n)$: every iteration of the inner <code class="language-plaintext highlighter-rouge">while</code> loop removes an element from the stack, and we only add elements to the stack $O(n)$ times, so we only execute the inner loop $O(n)$ times.</p>
<h2 id="largest-submatrix-of-a-binary-matrix">Largest Submatrix of a Binary Matrix</h2>
<p>Consider the following problem: we’re given a $n \times m$ binary matrix $B$ and we want to find the area of the largest rectangle that only contains 1s.</p>
<p>We will now show how to solve this problem in $O(nm)$. The idea is to find the largest rectangle that ends at row $i$, and then take the maximum accross all rows.</p>
<p>Suppose that the largest rectangle ending in row $i$ includes a given column $j$. The maximum height it can have is bounded by how many consecutive 1s there in previous rows for column $j$. If we call the length of such consecutive 1s $h_j$, we can now visualize these as heights of columns, so finding the largest rectangle ending in row $i$ can be reduced to finding the maximum area under a histogram, which we can do in $O(m)$.</p>
<figure class="center_children">
<img src="https://www.kuniga.me/resources/blog/2021-01-09-max-area-under-histogram/example_2.png" alt="a diagram depicting a quantum circuit" />
<figcaption>Figure 2: 5 x 5 matrix. At the last row, he can visualize bars of histogram with heights: 1, 0, 2, 4, 1</figcaption>
</figure>
<p>How do we compute $h_j$ for row $i$? If we know the the “heights” of row $i-1$, say $h’$, we can compute it for $i$. Let $b_{ij}$ be the element in $B$ at row $i$ and column $j$. If $b_{ij} = 1$, then $h_j = h’_j + 1$. Otherwise, we break the chain of consecutive 1s, so $h_j = 0$.</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="k">def</span> <span class="nf">get_max_rectangle</span><span class="p">(</span><span class="n">b</span><span class="p">):</span>
<span class="n">max_a</span> <span class="o">=</span> <span class="mi">0</span>
<span class="n">hist</span> <span class="o">=</span> <span class="p">[</span><span class="mi">0</span><span class="p">]</span><span class="o">*</span><span class="n">m</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">n</span><span class="p">):</span>
<span class="k">for</span> <span class="n">j</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">m</span><span class="p">):</span>
<span class="k">if</span> <span class="n">matrix</span><span class="p">[</span><span class="n">i</span><span class="p">][</span><span class="n">j</span><span class="p">]</span> <span class="o">==</span> <span class="mi">1</span><span class="p">:</span>
<span class="n">hist</span><span class="p">[</span><span class="n">j</span><span class="p">]</span> <span class="o">+=</span> <span class="mi">1</span>
<span class="k">else</span><span class="p">:</span>
<span class="n">hist</span><span class="p">[</span><span class="n">j</span><span class="p">]</span> <span class="o">=</span> <span class="mi">0</span>
<span class="n">max_a</span> <span class="o">=</span> <span class="nb">max</span><span class="p">(</span><span class="n">max_a</span><span class="p">,</span> <span class="n">get_max_area_hist</span><span class="p">(</span><span class="n">hist</span><span class="p">))</span></code></pre></figure>
<p>It’s easy to see that the algorithm above is $O(nm)$.</p>
<p>Note that if the problem asked for the largest <strong>square</strong>, the problem is easier. Let $s_{ij}$ be the length of any side of the largest square that ends at row $i$ and column $j$ and we know how to comput it for $i’ = i - 1$ or $j’ = j - 1$. Then $s_{ij} = \min(s_{ij’}, s_{i’j}, s_{i’j’}) + 1$.</p>
<h2 id="conclusion">Conclusion</h2>
<p>I remember seeing the “max area under histogram” problem a long time ago but I didn’t remember the solution. The use of a stack is very clever but not straightforward to see why it works.</p>
<h2 id="references">References</h2>
<ul>
<li>[<a href="https://www.hackerrank.com/challenges/largest-rectangle/editorial">1</a>] HackerRank - Largest Rectangle Editorial</li>
</ul>Guilherme KunigamiThis is a classic programming puzzle. Suppose we’re given an array of $n$ values reprenting the height $h_i \ge 0$ of bars in a histogram, for $i = 0, \cdots, n-1$. We want to find the largest rectangle that fits “inside” that histogram. More formally, we’d like to find indices $l$ and $r$, $l \le r$ such that $(l - r + 1) \bar h$ is maximum, where $\bar h$ is the smallest $h_i$ for $i \in \curly{l, \cdots, r}$. In this post we’ll describe an $O(n)$ algorithm that solves this problem.2020 in Review2021-01-01T00:00:00+00:002021-01-01T00:00:00+00:00https://www.kuniga.me/blog/2021/01/01/2020-in-review<!-- This needs to be define as included html because variables are not inherited by Jekyll pages -->
<p>This is a meta-post to review what happened in 2020.</p>
<!--more-->
<h2 id="posts-summary">Posts Summary</h2>
<p>This year I set out to learn about <strong>Quantum Computing</strong>. My aim was to understand <a href="https://www.kuniga.me/blog/2020/12/26/shors-prime-factoring.html">Shor’s Prime Factoring Algorithm</a> and learn whatever was needed for that. This led to the study of <a href="https://www.kuniga.me/blog/2020/10/11/deutsch-jozsa-algorithm.html">The Deutsch-Jozsa Algorithm</a>, <a href="https://www.kuniga.me/blog/2020/11/21/quantum-fourier-transform.html">Quantum Fourier Transform</a>, <a href="https://www.kuniga.me/blog/2020/12/23/quantum-phase-estimation.html">Quantum Phase Estimation</a> and <a href="https://www.kuniga.me/blog/2020/12/11/factorization-from-order.html">Number Factorization from Order-Finding</a>.</p>
<p>I’m satisfied with the learning progress and glad to finally have a better understanding of Shor’s algorithm, even though I procrastinated until the second half of the year to start my studies. I liked the approach of having a specific goal in mind, “Understand Shor’s algorithm”, as opposed to the more vague “Learn Quantum Computing”, since it allows focusing and it’s clearer when I can stop.</p>
<p>I wrote about some topics relevant to work including <a href="https://www.kuniga.me/blog/2020/02/02/python-coroutines.html">Python Coroutines</a>, <a href="https://www.kuniga.me/blog/2020/03/07/sockets.html">Sockets</a>, <a href="https://www.kuniga.me/blog/2020/03/28/browser-performance.html">Browser Performance</a>, <a href="https://www.kuniga.me/blog/2020/01/04/observable.html">Observable</a> and <a href="https://www.kuniga.me/blog/2020/05/22/review-working-effectively-with-legacy-code.html">Review: Working Effectively With Legacy Code</a>.</p>
<p>I dedicated some time to learn about system development including <a href="https://www.kuniga.me/blog/2020/07/31/buddy-memory-allocation.html">Memory Allocation</a> and <a href="https://www.kuniga.me/blog/2020/04/24/cpu-cache.html">CPU Cache</a>.</p>
<p>I touched on machine learning by reading the paper <a href="https://www.kuniga.me/blog/2020/10/09/lara.html">Latent Aspect Rating Analysis on Review Text Data</a> and the optmization algorithm <a href="https://www.kuniga.me/blog/2020/09/04/lbfgs.html">L-BFGS</a>.</p>
<p>I had fun writing about two programming puzzles, <a href="https://www.kuniga.me/blog/2020/05/25/minimum-string-from-removing-doubles.html">Shortest String From Removing Doubles</a> and <a href="https://www.kuniga.me/blog/2020/11/06/puzzling-election.html">A Puzzling Election</a>.</p>
<p>I read a book on Information Theory which I didn’t end up writing about, but it inspired me to revisit <a href="https://www.kuniga.me/blog/2020/06/11/huffman-coding.html">Huffman Coding</a>.</p>
<h2 id="the-blog-in-2020">The Blog in 2020</h2>
<p>This year the blog went through major transformations. After about 10 years using Wordpress, I finally decided to <a href="https://www.kuniga.me/blog/2020/07/11/from-wordpress-to-jekyll.html">migrate to Github pages</a> for more control.</p>
<p>One of the features I miss the most is the well integrated analytics. I’m currently using Google analytics but it doesn’t have a reliable way to exclude my own visits, which is a lot especially while writing a post. With that caveat, according to the data, the <a href="https://www.kuniga.me/blog/2020/07/31/buddy-memory-allocation.html">Buddy Memory Allocation</a> post was the most popular with 146 unique visits. Overall the blog had a total of 1.6k visitors.</p>
<p>I kept the resolution to have at least one post a month on average, by writing 19 posts. The blog completed 8 years with 115 posts (some of which were ported and translated from my old blog in Portuguese).</p>
<h2 id="resolutions-for-2021">Resolutions for 2021</h2>
<p>I enjoyed learning about the basics of quantum computing, but I found it highly theoretical. I’m still interested in it from a purely intellectual point of view, especially in learning about Quantum information theory and the complexity class of quantum algorithms, but it will not be my focus.</p>
<p>For 2021 I’ll try to focus on less things. My only explicit goal for 2021 is to learn about machine learning especifically for speech recognition. I’ll try to learn the state of the art and the theory behind it, but also anything related to this problem from a practical perspective such as audio encoding, OS drivers for microphones, signal processing, etc.</p>
<h2 id="personal">Personal</h2>
<p>The end of the year is a good time to look back and remember all the things I’ve done besides work and the technical blog. Due to the coronavirus pandemic this year there wasn’t much opportunity for travelling, but on the other hand I ended up having a lot more time for catching up on reading.</p>
<h3 id="trips">Trips</h3>
<p>Despite travel restrictions, I was able to go on roadtrips around California which has beautiful scenery. I had a chance to go again to Yosemite, Pinnacles and Death Valley National Parks, besides doing a lot of hikes and some camping in local parks.</p>
<figure class="center_children">
<img src="https://www.kuniga.me/resources/blog/2021-01-01-2020-in-review/2020-nps.png" alt="a collage of photos from different national parks in California" />
<figcaption>
Top (All in Yosemite):
1. <a href="https://www.flickr.com/photos/kunigami/50800718838/" target="_blank">Nevada Fall</a>;
2. <a href="https://www.flickr.com/photos/kunigami/50800727303/" target="_blank">Cathedral Peak</a>;
3. <a href="https://www.flickr.com/photos/kunigami/50800727608/" target="_blank">Half-dome</a>.
Bottom:
4. <a href="https://www.flickr.com/photos/kunigami/50800727093/" target="_blank">Bear Gulch reservoir in Pinnacles</a>;
5. <a href="https://www.flickr.com/photos/kunigami/50801468616/" target="_blank">Death Valley from the Windrose trail</a>;
6. <a href="https://www.flickr.com/photos/kunigami/50787619163/" target="_blank">Red Rock Canyon State Park</a>
</figcaption>
</figure>
<h3 id="books">Books</h3>
<p>As I mentioned, the pandemic left a lot of more indoor time which led to more reading. Here are the books I finished reading in 2020.</p>
<p><strong>History</strong></p>
<table class="books-table">
<tbody>
<tr>
<td><b>Bury my Heart at Wounded Knee</b> by Dee Brown. The history of many native American tribes (Arapaho, Apache, Cheyenne, Kiowa, Navaho, Sioux) in the late 19th century and their fight against American settlers and military. It's a bit hard to read at times due to violent and injust acts of the latter. I wasn't familiar with this dark side of American history.</td>
<td><img src="https://www.kuniga.me/resources/blog/2021-01-01-2020-in-review/bury_my_heart.jpg" alt="Bury my Heart at Wounded Knee Book Cover" /></td>
</tr>
<tr>
<td><b>The Last Mughal</b> by William Dalrymple. Recounts the history of India, centered around the last years of Zafar's reign, and preceding the British Raj. I couldn't help drawing parallels with <i>Bury my Heart at Wounded Knee</i>. I picked this as part of the trip to India in 2019 - most of the book is in Delhi, so it is a good read if you're visiting the region.</td>
<td><img src="https://www.kuniga.me/resources/blog/2021-01-01-2020-in-review/last_mughal.jpg" alt="The Last Mughal Book Cover" /></td>
</tr>
<tr>
<td><b>Sapiens</b> by Yuval Noah Harari. I rarely re-read books but I recall liking this book so much a few years back that I decided to revisit. I didn't remember a lot of the contents and wasn't as amused, possibly due to high expectations and maybe having internalized some of the more surprising facts. I do liked the idea of dedicating some time to re-read books I really liked, so I'll try to make a point of re-reading a book every year.</td>
<td><img src="https://www.kuniga.me/resources/blog/2021-01-01-2020-in-review/sapiens.jpg" alt="Sapiens Book Cover" /></td>
</tr>
<tr>
<td><b>The Great Influenza</b> by John M. Barry. Recounts the events around the US during the pandemic 1918. It focuses a lot on the life of the scientists that made key contributions during and after the time, and also the revolution of American medicine which started a few decades prior to the pandemic. It shouldn't be surprising my choice of this book during 2020 :)</td>
<td><img src="https://www.kuniga.me/resources/blog/2021-01-01-2020-in-review/the_great_influenza.jpg" alt="The Great Influenza Cover" /></td>
</tr>
<tr>
<td><b>The Quartet</b> by Joseph J. Ellis. I'm not the biggest fan of American history but knowing this year we'd be limited to be within the US and since I enjoy reading history from places I travel to, I decided to give it a try.
It focuses on what's called the second American revolution (the first being independence from Britain) led by four proeminent figures: Alexander Hamilton, George Washington, John Jay, and James Madison. It culminates with the writing of the constitution.
It was interesting to learn how much the struggle of powers between states and the federal government influenced the nature of the constitution.
</td>
<td><img src="https://www.kuniga.me/resources/blog/2021-01-01-2020-in-review/the_quartet.jpg" alt="The Quartet Cover" /></td>
</tr>
<tr>
<td><b>The Silk Roads</b> by Peter Frankopan. This book tells the history of world from the point of view of the region covered by the Silk roads, which include countries from the near and middle east and central asia. I don't recall learning so much history from a single book, and if I had to pick one, this would be my favorite book from 2020.
</td>
<td><img src="https://www.kuniga.me/resources/blog/2021-01-01-2020-in-review/silk_roads.jpg" alt="The Silk Roads Book cover" /></td>
</tr>
</tbody>
</table>
<p><strong>Science</strong></p>
<table class="books-table">
<tbody>
<tr>
<td><b>I'm a Strange Loop</b> by Douglas Hofstadter. I was impressed by Hofstadter's Gödel Escher and Bach but had trouble grasping a lot of the subjects. The author claims that <i>I'm Strange Loop</i> is a more focused and intuitive take into consciousness. It borrows a lot on his personal experiences which makes it kind of an auto-biography. Overall it's a fascinating philosophical discussion. My favorite bit was the thought experiment by Derek Parfit regarding the uniqueness of the "self", which is summarized in <a href="https://en.wikipedia.org/wiki/Reasons_and_Persons">here</a>. I'm looking forward to reading <i>Reasons and Persons</i>.</td>
<td><img src="https://www.kuniga.me/resources/blog/2021-01-01-2020-in-review/i_am_a_strange_loop.jpg" alt="I'm a Strange Loop Book Cover" /></td>
</tr>
<tr>
<td><b>Working Effectively With Legacy Code</b> by Michael C. Feathers. I wrote a <a href="https://www.kuniga.me/blog/2020/05/22/review-working-effectively-with-legacy-code.html">post</a> about it.
</td>
<td><img src="https://www.kuniga.me/resources/blog/2021-01-01-2020-in-review/work_with_legacy_code.jpg" alt="Working Effectively With Legacy Code Book Cover" /></td>
</tr>
<tr>
<td><b>An Introduction to Information Theory</b> by John R. Pierce. I don't recall why I had this book on my shelf, but it had been there for a while so I decided to catch up on my unread books. It doesn't require prior advanced math knowledge but it's still a textbook. I like its multi-disciplinary approach, for example: bringing in thermodynamics to discuss and compare entropy in physics and in information theory; talking (briefly) about quantum information theory; considering information theory in arts and linguistics. It inspired me to write about <a href="https://www.kuniga.me/blog/2020/07/31/buddy-memory-allocation.html">Huffman encoding</a>.</td>
<td><img src="https://www.kuniga.me/resources/blog/2021-01-01-2020-in-review/information_theory.jpg" alt="An Introduction to Information Theory Book Cover" /></td>
</tr>
<tr>
<td><b>I contain multitudes</b> by Ed Young. This book explores the world of microbes and makes the case that there are not inherently good or bad microbes, but there are those that happen to benefit us vs. not, and in some cases the same species even play both roles depending on the situation.</td>
<td><img src="https://www.kuniga.me/resources/blog/2021-01-01-2020-in-review/i_contain_multitudes.jpg" alt="I contain multitudes Book Cover" /></td>
</tr>
<tr>
<td><b>Why we sleep</b> by Matthew Walker. I learned how important sleeping is for our health. Insufficient sleep is related to a plethora of diseases and conditions, including cancer, obesity, the immune system health, etc.</td>
<td><img src="https://www.kuniga.me/resources/blog/2021-01-01-2020-in-review/why_we_sleep.jpg" alt="Why we sleep Book Cover" /></td>
</tr>
<tr>
<td><b>Beyond Weird</b> by Philip Ball. Quantum Mechanics for lay people which I found very accessible. It doesn't require prior knowledge of quantum mechanics but it does try to clarify where the popular notions of entanglement, superposition, quantum teleportation come from. My main takeway is that quantum mechanics is a mathematical theory (abstraction) that exists without necessarily having an explicit representation in reality, which is hard to be satisfied with given it does predict a lot of real-world observations.</td>
<td><img src="https://www.kuniga.me/resources/blog/2021-01-01-2020-in-review/beyond_weird.jpg" alt="Beyond Weird Book Cover" /></td>
</tr>
<tr>
<td><b>Infinite Powers</b> by Steven Strogatz. It covers the history of Calculus including the seeds of the theory which started with mathematicians from the ancient era such as Archimedes, developing through Galileo, Kepler until the full-development by Leibniz and Newton. It is very informative and provides an intuitive and gentle introduction to calculus. It also describes important applications both in theory and practice (quantum mechanics, GPS, CTScan).</td>
<td><img src="https://www.kuniga.me/resources/blog/2021-01-01-2020-in-review/infinite_powers.jpg" alt="Infinite Powers Book Cover" /></td>
</tr>
</tbody>
</table>
<p><strong>Other non-fiction</strong></p>
<table class="books-table">
<tbody>
<tr>
<td><b>The Everything Store</b> by Brad Stone. As with the <a href="https://www.kuniga.me/blog/2020/01/04/2019-in-review.html">biography of Phil Knight</a> (Nike's founder), this biography of Jeff Bezzos is intertwined with that of his company. I learned some interesting facts for example, how much leverage Amazon has on acquiring smaller competitors (such as Zappos).</td>
<td><img src="https://www.kuniga.me/resources/blog/2021-01-01-2020-in-review/the_everything_store.jpg" alt="The Everything Store Book Cover" /></td>
</tr>
<tr>
<td><b>Everybody Lies</b> by Seth Stephens-Davidowitz. Seth is a data scientist who finds insights using publicly available sources. One of my main takeaways is that Google trends is a particularly rich source of data because people make searches in anonymity. This is in contrast to public surveys or social media where people tend to be "polically correct" and not fully honest.</td>
<td><img src="https://www.kuniga.me/resources/blog/2021-01-01-2020-in-review/everybody_lies.jpg" alt="Everybody Book Cover" /></td>
</tr>
<tr>
<td><b>Don't make me think</b> by Steve Krug. This book provides several practical advices on making websites more user friendly. I felt I had already internalized a lot of the good practices suggested by having worked with web tools that inherited a lot of designs made by someone with good UX knowledge. It was useful to see them listed out explicitly though.</td>
<td><img src="https://www.kuniga.me/resources/blog/2021-01-01-2020-in-review/dont_make_me_think.jpg" alt="Don't make me think Book Cover" /></td>
</tr>
<tr>
<td><b>Peopleware</b> by Tom DeMarco and Timothy Lister. Every list of recommended programming books seems to include this (among others that I like such as *Code Complete*), so I decided to give it a go. I am not and don't plan to manage people any time soon, but I wanted to understand what makes a good manager, since most people work with one. The book covers a set of topics primarily focused on the happiness and productivity of individuals. It's full of interesting anectodes and it's not prescriptive. I enjoyed it overall and might write a review at some point.</td>
<td><img src="https://www.kuniga.me/resources/blog/2021-01-01-2020-in-review/peopleware.jpg" alt="Peopleware Book Cover" /></td>
</tr>
</tbody>
</table>
<p><strong>Fiction</strong></p>
<table class="books-table">
<tbody>
<tr>
<td><b>Invisible Cities</b> by Italo Calvino. I started this book a long time ago (2018?) but only finished this year. It consists of a set of short stories about fictious cities. It's hard to make sense on some of them but the imagery some of them evoke are very artistic.</td>
<td><img src="https://www.kuniga.me/resources/blog/2021-01-01-2020-in-review/invisible_cities.jpg" alt="Invisible Cities Book Cover" /></td>
</tr>
<tr>
<td><b>The Overstory</b> by Richard Powers. Beautiful book and message. I like how a lot of the story happens around the Bay Area. I learned that in Stanford's <a href="https://trees.stanford.edu/treewalks/treemaps.htm" target="_blank">main quad</a> there are a variety of trees from all over the world. I thought the author went a bit overboard with esoteric words, and I had to look up the dictionary pretty often.</td>
<td><img src="https://www.kuniga.me/resources/blog/2021-01-01-2020-in-review/the_overstory.png" alt="The Overstory Book Cover" /></td>
</tr>
</tbody>
</table>Guilherme KunigamiThis is a meta-post to review what happened in 2020.Shor’s Prime Factoring Algorithm2020-12-26T00:00:00+00:002020-12-26T00:00:00+00:00https://www.kuniga.me/blog/2020/12/26/shors-prime-factoring<!-- This needs to be define as included html because variables are not inherited by Jekyll pages -->
<figure class="image_float_left">
<img src="https://www.kuniga.me/resources/blog/2020-12-26-shors-prime-factoring/peter-shor.png" alt="Peter Shor thumbnail" />
</figure>
<p>Peter Shor is an American professor at MIT. He received his B.S. in Mathematics at Caltech and earned his Ph.D. in Applied Mathematics from MIT advised by Tom Leighton. While at Bell Labs, Shor developed the Shor’s prime factorization quantum algorithm, which awarded him prizes including the Gödel Prize in 1999.</p>
<p>In this post we’ll combine the parts we studied before to understand Peter Shor’s prime factorization quantum algorithm, which can find a factor of a composite number exponentially faster than the best known algorithm using classic computation.</p>
<p>We’ll need basic familiarity with quantum computing, covered in a <a href="https://www.kuniga.me/blog/2020/10/11/deutsch-jozsa-algorithm.html">previous post</a>. The bulk of the post is showing how to efficiently solve the order-finding problem since we learned from <a href="https://www.kuniga.me/blog/2020-12-11-factorization-from-order.html">Number Factorization from Order-Finding</a> that it is the bottleneck step in finding a prime factor of a composite number. The remaining is putting everything together and do some analysis of the performance of the algorithm as a whole.</p>
<!--more-->
<h2 id="quantum-order-finding">Quantum Order-finding</h2>
<p>Recall the definition of order-finding from [2]:</p>
<blockquote>
<p>Given integers $x$, $N$, the problem of <em>order-finding</em> consists in finding the smallest positive number $r$ such that $x^r \equiv 1 \Mod{N}$, where $r$ is called the <em>order of</em> $x \Mod{N}$.</p>
</blockquote>
<p>The basic idea is to choose a unitary matrix $U$ and show that its eigenvalue contains the order of $x \Mod{N}$.</p>
<h3 id="choosing-the-operator-u">Choosing the Operator $U$</h3>
<p>We choose $U$ such that $U \ket{u} = \ket{x u \Mod{N}}$, which can be shown to be a unitary matrix. Now suppose our eigenvector is</p>
\[\ket{u_s} = \frac{1}{\sqrt{r}} \sum_{k = 0}^{r-1} \exp({\frac{-2 \pi i s k}{r}}) \ket{x^k \Mod{N}}\]
<p>For a parameter $0 \le s \le r - 1$. If we apply the operator $U$:</p>
\[U \ket{u_s} = \ket{x} \frac{1}{\sqrt{r}} \sum_{k = 0}^{r-1} \exp({\frac{-2 \pi i s k}{r}}) \ket{x^k \Mod{N}}\]
\[= \frac{1}{\sqrt{r}} \sum_{k = 0}^{r-1} \exp({\frac{-2 \pi i s k}{r}}) \ket{x^{k+1} \Mod{N}}\]
<p>We can show that (see <em>Lemma 1</em> in the <em>Appendix</em>)</p>
\[U \ket{u_s} = \exp({\frac{2 \pi i s}{r}}) \ket{u_s}\]
<p>We conclude that $\exp({\frac{2 \pi i s}{r}})$ is the eigenvalue for $U$ and $\ket{u_s}$. We can then measure $\varphi \approx s/r$ via <a href="https://www.kuniga.me/blog/2020/12/23/quantum-phase-estimation.html">Quantum Phase Estimation</a>.</p>
<p>But how do we prepare the state $\ket{u_s}$ for some $s$?</p>
<h3 id="preparing-the-eigenvector">Preparing the eigenvector</h3>
<p>We don’t know how to prepare $\ket{u_s}$ for a specific $s$, but we can prepare a state which is a linear combination of $\ket{u_s}$, in particular:</p>
\[\frac{1}{\sqrt{r}} \sum_{s=0}^{r-1} \ket{u_s}\]
<p>Which can be show to be exactly $\ket{1}$ (see <em>Lemma 2</em> in the <em>Appendix</em>). That means if we use $\ket{1}$ as our initial eigenvector, we’ll measure $\varphi$ corresponding to one of the eigenvalues $\exp({\frac{2 \pi i s}{r}})$, but we don’t know which $s$ was used!</p>
<p>Let’s take a detour to revisit continued fractions and learn how to leverage them to recover $r$ and $s$.</p>
<h2 id="continued-fractions">Continued Fractions</h2>
<p>Continued fractions is a way to represent rational numbers such that they can be iteratively approximated. For example, consider the rational $\frac{31}{13}$.</p>
<p>We can represent it as $2 + \frac{5}{13}$, which is the same as $2 + \frac{1}{\frac{13}{5}}$. We can repeat this process for $13/5$ to get</p>
\[\frac{31}{13} = 2 + \frac{1}{2 + \frac{1}{\frac{3}{5}}}\]
<p>If we continue with this, we’ll end up with</p>
\[\frac{31}{13} = 2 + \frac{1}{2 + \frac{1}{1 + \frac{1}{1 + \frac{1}{2}}}}\]
<p>We can’t keep doing this with $\frac{1}{2}$ since the denominator of $\frac{1}{(\frac{2}{1})}$ leaves no remainder.</p>
<p>More formally, consider a rational number greater than 1, $p/q$. We rewrite it as $a + p/q$, such that $p = aq + b$, $b < q$. Since $b/q < 1$, $1/(q/b) > 1$, and we can repeat the procedure for $q/b$. Note that because $b < q$ this algorithm will eventually end. The algorithm should return the list of $a$’s generated in the process, which provide a unique representation for $p/q$.</p>
<p>This idea can be implemented in a short Python code:</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="k">def</span> <span class="nf">continued_fraction</span><span class="p">(</span><span class="n">p</span><span class="p">,</span> <span class="n">q</span><span class="p">):</span>
<span class="n">a</span> <span class="o">=</span> <span class="p">[]</span>
<span class="k">while</span> <span class="n">q</span> <span class="o">></span> <span class="mi">0</span> <span class="ow">and</span> <span class="n">p</span> <span class="o">></span> <span class="n">q</span><span class="p">:</span>
<span class="n">a</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">p</span> <span class="o">//</span> <span class="n">q</span><span class="p">)</span>
<span class="n">p</span><span class="p">,</span> <span class="n">q</span> <span class="o">=</span> <span class="n">q</span><span class="p">,</span> <span class="n">p</span> <span class="o">%</span> <span class="n">q</span>
<span class="k">return</span> <span class="n">a</span></code></pre></figure>
<p>As an example, if we run it with $p = 31, q = 13$, it returns $a = [2, 2, 1, 1, 2]$.</p>
<h3 id="recovering-the-rational-number">Recovering the Rational Number</h3>
<p>Given a list of integers $a = [a_0, \cdots, a_n]$, it’s possible to obtain recover the numerator and denominator $p$ and $q$ that has the continued fraction corresponding to $a$.</p>
<p>If we define $p_0 = a_0$, $q_0 = 1$, $p_1 = 1 + a_0 a_1$ and $q_1 = a_1$, and then recursively</p>
\[\begin{aligned}
p_n &= a_n p_{n-1} + p_{n-2}\\
q_n &= a_n q_{n-1} + q_{n-2}
\end{aligned}\]
<p>It’s possible to show that $[a_0, \cdots, a_n] = p_n / q_n$ and furthermore, that $p_n$ and $q_n$ are co-primes. This provides an easy $O(n)$ algorithm to recover $p$ and $q$ given $[a_0, \cdots, a_n]$.</p>
<h3 id="finding-nearby-rational-numbers">Finding Nearby Rational Numbers</h3>
<p>Suppose we are given a rational number $x$ and we want to recover co-primes $p$ and $q$, such that</p>
\[\abs{\frac{p}{q} - x} \le \frac{1}{2q^2}\]
<p>It’s possible to show that $x$ has a continued fraction $a = [a_0, \cdots, a_n]$ and that $p / q = [a_0, \cdots, a_k]$, for $k \le n$.</p>
<p>This means that if we feed $x$’s continued fraction to the algorithm from the section above, we’ll invariably run into $p = p_k$ and $q = q_k$.</p>
<p>This flexibility is important in the context of our problem because the value we’ll measure, $\varphi$, is not exactly $s/r$ but an approximation.</p>
<h3 id="recovering-r-via-continued-fractions">Recovering $r$ via Continued Fractions</h3>
<p>We’ll now see how to recover $s$ and $r$ from $\varphi$. We know that both $s$ and $r$ are integers, so $s/r$ is a rational, and we can use continued fractions to extract them.</p>
<p>We first compute the continued fraction of $\varphi$. Then we try to recover its numerator and denominator, and it can be shown that</p>
\[\abs{\frac{s}{r} - \varphi} \le \frac{1}{2r^2}\]
<p>So by the discussion in the previous section we’ll invariably pass by $p_k / q_k$ such that $p_k / q_k = s / r$.</p>
<p>The problem is that we don’t know for which $k$ that’s the case, but we can determine if $r = q_k$ for each $k$ by checking whether $x^{q_k} \equiv 1 \Mod{N}$. If $r$ and $s$ are co-primes, then we’ll find it.</p>
<p>If not, let $r_0 = q_k$ for a given iteration $k$ and assume that $x^{r_0} \not \equiv 1 \Mod{N}$. We can show that $r_0$ is a factor of $r$ (see <em>Lemma 3</em> in <em>Appendix</em>). Let $x_0 \equiv x^{r_0} \Mod{N}$. The order of $x_0$ is $r_{r_0} = r/r_0$ since $x_0^{r/r_0} = x^{r}$. We’ll obtain $r_1$, which if happens to be the order of $x_0$ allows us to get $r$ via $r = r_1 r_0$. Otherwise we repeat for $x_0’ \equiv x^{r_1}$. Since $r_0$ is a proper factor of $r$, $r_0 \le r/2$, so on each iteraction we at least halve the order, which means we only need $O(\log r)$ iterations. If we reach the point where $r_{n} = 1$, it means that $q_k$ is not valid and hence $p_k / q_k \ne s / r$.</p>
<p>It’s also possible that the true value of $s$ is $0$, in which case we won’t be able to find $r$. This can happen with probability $p(s=0) \le r$, in which case we repeat the whole algorithm.</p>
<p><strong>Note.</strong> In [1], it suggests we can find $s’/r’ = s/r$ where $s’$ and $r’$ are co-prime from $\varphi$ using continued fractions alone, but I don’t understand how from studying the proof of <em>Theorem A 4.16</em>. In other words, we know $k$ for which $p_k / q_k = s / r$, which in turn will lead to a more efficient algorithm.</p>
<h2 id="shors-prime-factoring">Shor’s Prime Factoring</h2>
<p>We now have all the pieces to solve the prime factoring. We first use $U \ket{u} = \ket{x u \Mod{N}}$ and $\ket{u} = \ket{1}$. We’ll measure $s / r$ for $s \in \curly{0, \cdots, r-1}$ with high-probability. We can recover $r$, the order of $x \Mod{N}$, by using the method outlined in the previous section.</p>
<p>Finally, once we know $r$ we can obtain a prime factor with high-probability as described in <a href="https://www.kuniga.me/blog/2020-12-11-factorization-from-order.html">Number Factorization from Order-Finding</a>.</p>
<h2 id="performance">Performance</h2>
<h3 id="runtime-complexity">Runtime Complexity</h3>
<p>Let’s first consider the number of gates needed for performing the steps above. Let $L$ represent the number of bits of $x$ for which we want to compute the order modulo $N$.</p>
<p><strong>Measuring $\phi$.</strong> We assume $t$ is roughly the size of $L$. From the circuit depicted in <em>Figure 1</em> in <a href="https://www.kuniga.me/blog/2020/12/23/quantum-phase-estimation.html">Quantum Phase Estimation</a>, we need $O(L)$ Hadamard gates and $O(L)$ of the $U^{2^k}$ gates. We can use an <a href="https://en.wikipedia.org/wiki/Modular_exponentiation">efficient modular exponentiation algorithm</a> which can compute $a^b$ in $O(\log(b) \log^2(a))$ where $\log^2(a)$ is due to the multiplications, so $U^{2^k}$ can be implemented using $O(L^3)$ gates. Summarizing, we can measure $\phi \approx s/r$ using $O(L^4)$ gates.</p>
<p><strong>Recovering $r$ via Continued Fractions.</strong> It’s possible to show that we can compute the continued fraction of $\phi$ in $O(L^3)$, and that the output has size $O(L)$. We then need $O(L)$ iterations to recover its numerator and denominator, but at each step $k$ we need to know whether the numerator $q_k$ is $r$ by computing $x^r \Mod{N}$, which can be done in $O(L^3)$ using fast modular exponentiation.</p>
<p>However, if we didn’t find $r$, we need to repeat the process up to $O(\log r) = O(L)$ times, each time re-computing $x^r \Mod{N}$, for a total of $O(L^4)$ operations each time we need to test a candidate $q_k$. This amounts up to $O(L^5)$.</p>
<p><strong>Number Factorization from Order-Finding.</strong> Finally, as we discussed in [3], the complexity of obtaining a prime factor is $O(L^3)$ excluding the order finding step.</p>
<p>Recovering $r$ from $\phi$ dominates the overall complexity, leading to a $O(L^5)$ probabilistic quantum algorithm that can find a prime factor of a composite number.</p>
<p>For comparison, if we use a linear algorithm to find the order as discussed in [3], we would end up with a $O(2^L)$ one.</p>
<h3 id="precision">Precision</h3>
<p>We have a few sources of uncertainty in the algorithm, which we recap now:</p>
<ul>
<li>The possibility of not measuring the real $s / r$, which is less than 60% (see <em>Measuring $\phi$</em> in [2]) and can further reduced by using more gates, or simply repeating the phase estimation algorithm since each run is independent.</li>
<li>The possibility that $s = 0$ (see <em>Recovering $r$ via Continued Fractions</em> above), which has low probability ($1 / r$) and can be further reduced by repeating the phase estimation algorithm.</li>
<li>The possibility that the randomly chosen $x$ in the prime factoring algorithm (see <em>Prime Factoring Algorithm</em> in [3]) yields a “bad” $r$, which is less than 25% and can be further reduced by repeatedly choosing a new random $x$.</li>
</ul>
<h2 id="conclusion">Conclusion</h2>
<p>This is the final post that led to the Shor’s prime factoring. Regardless of the practical applicability of this method, I found it fascinating how much theory it relies on. I learned a lot about quantum computing in the process and while I might not have grasped every single detail, I think I have a good overall idea on how everything comes together.</p>
<p>I found that:</p>
\[\frac{1}{\sqrt{r}} \sum_{s=0}^{r-1} \ket{u_s} = \ket{1}\]
<p>Is mind-blowing. It’s as if a bunch of eigenvectors are “hiding” inside $\ket{1}$ and only “come out” when measured.</p>
<h2 id="related-posts">Related Posts</h2>
<ul>
<li><a href="https://www.kuniga.me/blog/2019/04/12/consistent-hashing.html">Consistent Hashing</a> - this is a trivia rather than real relatedeness, but I found out that <a href="https://en.wikipedia.org/wiki/F._Thomson_Leighton">Tom Leighton</a> was the advisor of both <a href="https://en.wikipedia.org/wiki/Daniel_Lewin">Daniel Lewin</a> (founder of Akamai, featured in that post) and Peter Shor (featured in this post).</li>
</ul>
<h2 id="appendix">Appendix</h2>
<p><strong>Lemma 1.</strong> $U \ket{u_s} = \exp({\frac{2 \pi i s}{r}}) \ket{u_s}$</p>
<p><em>Proof.</em> To simplify the notation, assume $\alpha = \frac{-2 \pi i s}{r}$ and $y^k = x^k \Mod{N}$. We can write $\ket{u_s}$ as:</p>
\[\ket{u_s} = \frac{1}{\sqrt{r}} (e^{\alpha 0} \ket{y^0} + e^{\alpha 1} \ket{y^1} + \cdots + e^{\alpha (r-1)} \ket{y^{r-1}})\]
<p>We can write $U \ket{u_s}$ as:</p>
\[U \ket{u_s} = \frac{1}{\sqrt{r}} (e^{\alpha 0} \ket{y^1} + e^{\alpha 1} \ket{y^2} + \cdots + e^{\alpha (r-1)} \ket{y^{r}})\]
<p>We note that $\ket{y^{r}} = \ket{y^{0}}$ since by definition $x^r \equiv x^0 \equiv 1 \Mod{N}$, so we can rearrange:</p>
\[U \ket{u_s} = \frac{1}{\sqrt{r}} (e^{\alpha (r-1)} \ket{y^{0}} + e^{\alpha 0} \ket{y^1} + \cdots + e^{\alpha (r-2)} \ket{y^{r - 1}})\]
<p>This looks almost like $\ket{u_s}$ except the exponents are shifted by 1. We can fix this by pulling a factor of $e^{\alpha}$:</p>
\[U \ket{u_s} = \frac{1}{e^{\alpha} \sqrt{r}} (e^{\alpha r} \ket{y^{0}} + e^{\alpha 1} \ket{y^1} + \cdots + e^{\alpha (r-1)} \ket{y^{r - 1}})\]
<p>We have $e^{\alpha r} = \exp(\frac{2 \pi i s r}{r}) = \exp(2 \pi i s)$ and since $s$ is integer, by Euler’s formula we conclude that $e^{\alpha r} = 1 = e^{\alpha 0}$, so we can do</p>
\[U \ket{u_s} = \frac{1}{e^{\alpha} \sqrt{r}} (e^{\alpha 0} \ket{y^{0}} + e^{\alpha 1} \ket{y^1} + \cdots + e^{\alpha (r-1)} \ket{y^{r - 1}})\]
<p>Now it’s easy to see that $U \ket{u_s} = \frac{1}{e^{\alpha}} \ket{u_s} = e^{-\alpha} \ket{u_s} = \exp({\frac{2 \pi i s}{r}}) \ket{u_s}$. <em>QED</em></p>
<p><strong>Lemma 2.</strong></p>
\[\frac{1}{\sqrt{r}} \sum_{s=0}^{r-1} \ket{u_s} = \ket{1}\]
<p><em>Proof:</em></p>
<p>Let</p>
\[S = \frac{1}{\sqrt{r}} \sum_{s=0}^{r-1} \ket{u_s}\]
<p>Replace the definition of $\ket{u_s}$:</p>
\[S = \frac{1}{\sqrt{r}} \sum_{s=0}^{r-1} (\frac{1}{\sqrt{r}} \sum_{k = 0}^{r-1} e^{-2 \pi i s k /r} \ket{x^k \Mod{N}})\]
<p>Moving scalars around and changing the order of the sums yields:</p>
\[\frac{1}{r} \sum_{k = 0}^{r-1} \ket{x^k \Mod{N}} (\sum_{s=0}^{r-1} e^{-2 \pi i s k /r} )\]
<p>For $k = 0$ the terms of the inner sum are equal to 1, so it adds up to $r$. Otherwise, the inner sum is a geometric sum on $s$, which has a closed form (see <em>Interlude: Classic Inverse Fourier Transform</em> [2] for more details):</p>
\[\sum_{s=0}^{r-1} e^{-2 \pi i s k /r} = \frac{1 - e^{-2 \pi i s k}}{1 - e^{-2 \pi i s k / r}}\]
<p>Since $k$ is a positive integer, $e^{-2 \pi i s k} = 1$. Since $k < r$, $k / r < 1$ and $e^{-2 \pi i s k / r} \neq 1$, which means the inner sum is 0 for $k > 0$.</p>
<p>Thus,</p>
\[S = \frac{1}{r} r \ket{x^0 \Mod{N}} = \ket{1}\]
<p><em>QED</em></p>
<p><strong>Lemma 3.</strong> Let $r, s, r’, s’$ be positive integers. If $s’ / r’ = s / r$ and $s’$ and $r’$ are co-primes, then $r’$ divides $r$. Furthermore, if $s$ and $r$ are not co-prime then $r’ < r$.</p>
<p><em>Proof</em> We have $s’ r = s r’$, and because prime factorization is unique, both sides have the same prime factors. All prime factors contributed by $r’$ must be matched by $r$ on the other side since $r’$ and $s’$ do not share any prime factors. This means that $r$ can be divided by $r’$.</p>
<p>If $r$ and $s$ are not co-prime then $r \neq r’$ because otherwise $s’ = s$ but $r’$ and $s’$ are co-prime. Also, since $r$ can be divided by $r’$, $r \ge r’$, so it must be $r > r’$. <em>QED</em></p>
<h2 id="references">References</h2>
<ul>
<li>[<a href="https://www.amazon.com/Quantum-Computation-Information-10th-Anniversary/dp/1107002176">1</a>] Quantum Computation and Quantum Information - Nielsen, M. and Chuang, I.</li>
<li>[<a href="https://www.kuniga.me/blog/2020/12/23/quantum-phase-estimation.html">2</a>] Quantum Phase Estimation</li>
<li>[<a href="(https://www.kuniga.me/blog/2020-12-11-factorization-from-order.html)">3</a>] Number Factorization from Order-Finding</li>
</ul>Guilherme KunigamiPeter Shor is an American professor at MIT. He received his B.S. in Mathematics at Caltech and earned his Ph.D. in Applied Mathematics from MIT advised by Tom Leighton. While at Bell Labs, Shor developed the Shor’s prime factorization quantum algorithm, which awarded him prizes including the Gödel Prize in 1999. In this post we’ll combine the parts we studied before to understand Peter Shor’s prime factorization quantum algorithm, which can find a factor of a composite number exponentially faster than the best known algorithm using classic computation. We’ll need basic familiarity with quantum computing, covered in a previous post. The bulk of the post is showing how to efficiently solve the order-finding problem since we learned from Number Factorization from Order-Finding that it is the bottleneck step in finding a prime factor of a composite number. The remaining is putting everything together and do some analysis of the performance of the algorithm as a whole.Quantum Phase Estimation2020-12-23T00:00:00+00:002020-12-23T00:00:00+00:00https://www.kuniga.me/blog/2020/12/23/quantum-phase-estimation<!-- This needs to be define as included html because variables are not inherited by Jekyll pages -->
<p>Given a unitary matrix $U$ with eigenvector $\ket{u}$, we want to estimate $\varphi$ where $e^{2 \pi i \varphi}$ is the eigenvalue of $U$.</p>
<p>This serves as framework for solving a varierity of problems including order finding, which as we have shown in a <a href="https://www.kuniga.me/blog/2020/12/11/factorization-from-order.html">recent post</a>, can be used to efficiently factorize a number.</p>
<p>We assume basic familiarity with quantum computing, covered in a <a href="https://www.kuniga.me/blog/2020/10/11/deutsch-jozsa-algorithm.html">previous post</a>, plus we’ll use <a href="https://www.kuniga.me/blog/2020/11/21/quantum-fourier-transform.html">quantum Fourier transform</a> (QFT) in one of the steps.</p>
<!--more-->
<h2 id="quantum-circuit">Quantum Circuit</h2>
<p>Let’s first consider smaller parts of the circuit before showing the whole picture.</p>
<h3 id="controlled-gate-revisited">Controlled gate revisited</h3>
<p>We described the CNOT gate in [2], and then that any $n$-qubit gate $U$ can be transformed into a $(n+1)$-qubit [3] (the $CR_k$ gate). In both cases the control qubit is assumed to be in the computational basis (more specifically $\ket{0}$ or $\ket{1}$).</p>
<p>Here we consider the case where the control bit is in an arbitrary state, e.g. $\alpha \ket{0} + \beta \ket{1}$.</p>
<p>Suppose we transformed a $n$-qubit gate $U$ into a $(n+1)$-qubit controlled gate. What happens when we apply it to a $(n+1)$-qubit $\ket{c} \ket{y}$ where $y$ is a $n$-qubit and $c$ is the control?</p>
<p>If $c = \ket{0}$, the output is $\ket{y} \ket{0}$ whereas if $c = \ket{1}$ the output is $U \ket{y} \ket{0}$. We can use the linearity principle and obtain</p>
\[(1) \quad \alpha \ket{0} \ket{y} + \beta \ket{1} U \ket{y}\]
<h3 id="the-u-gate">The $U$ gate</h3>
<p>Because $\ket{u}$ is the eigenvector of $U$ and $e^{2 \pi i \varphi}$ its eigenvalue, then by definition $U \ket{u} = e^{2 \pi i \varphi} \ket{u}$, so if we apply a gate with a corresponding unitart matrix $U$ to an input $\ket{u}$ we obtain $e^{2 \pi i \varphi} \ket{u}$.</p>
<p>It’s possible to show that $U^k$ for a positive integer $k$ is also a unitary matrix and thus has a corresponding gate. If we apply it to $U$’s eigenvector $\ket{u}$ we get $e^{2 \pi i k \varphi} \ket{u}$.</p>
<h3 id="a-simple-circuit">A simple circuit</h3>
<p>The circuit below is used as a building block for the larger one.</p>
<figure class="center_children">
<img src="https://www.kuniga.me/resources/blog/2020-12-23-quantum-phase-estimation/quantum-phase-estimation-single.png" alt="a diagram depicting a quantum circuit" />
<figcaption>Figure 1: Quantum phase estimation circuit for the k-th qubit</figcaption>
</figure>
<p>Let’s follow what’s happening. We start with:</p>
\[\ket{\psi_1} = \ket{0} \ket{u}\]
<p>In the first step we apply the Hadamard to obtain</p>
\[\ket{\psi_2} = (\frac{\ket{0} + \ket{1}}{\sqrt{2}}) \ket{u}\]
<p>Now the first qubit is used as control for the $U^{2^k}$ gate, which similarly to (1) is</p>
\[\frac{\ket{0} \ket{u} + \ket{1} U^{2^k} \ket{u}}{\sqrt{2}}\]
<p>Using the fact that $U^{2^k} \ket{u} = e^{2 \pi i 2^k \varphi} \ket{u}$ we have</p>
\[\frac{\ket{0} \ket{u} + \ket{1} e^{2 \pi i 2^k \varphi} \ket{u}}{\sqrt{2}}\]
<p>Since $e^{2 \pi i 2^k \varphi}$ is scalar we can do some re-arranging:</p>
\[\ket{\psi_3} = \frac{\ket{0} + e^{2 \pi i 2^k \varphi} \ket{1}}{\sqrt{2}} \ket{u}\]
<p>In this view, the first qubit $\ket{0}$ became $\frac{\ket{0} + e^{2 \pi i 2^k \varphi} \ket{1}}{\sqrt{2}}$ while the $n$-qubit $\ket{u}$ remained unchanged, which is a bit counter intuitive especially given the first qubit was the control one.</p>
<h3 id="the-whole-picture">The whole picture</h3>
<figure class="center_children">
<img src="https://www.kuniga.me/resources/blog/2020-12-23-quantum-phase-estimation/quantum-phase-estimation-640w.png" alt="a diagram depicting a quantum circuit" />
<figcaption>Figure 2: Quantum phase estimation circuit</figcaption>
</figure>
<p>We can see that this circuit combines $t$ of the small circuit from the previous session. The output of $U^{2^i}$ is fed into $U^{2^{i+1}}$ but as we saw above we can assume it doesn’t change the input so the final state would still be $\ket{u}$.</p>
<p>For each of the control qubits, there will be a corresponding $U^{2^k}$, so it will end as $\frac{\ket{0} + e^{2 \pi i 2^k \varphi} \ket{1}}{\sqrt{2}}$ as we saw above.</p>
<p>If we look at the whole state after applying this larger circuit we end up with</p>
\[\frac{
(\ket{0} + e^{2 \pi i 2^{t-1} \varphi} \ket{1})
(\ket{0} + e^{2 \pi i 2^{t-2} \varphi} \ket{1}) \cdots
(\ket{0} + e^{2 \pi i 2^0 \varphi} \ket{1}) \ket{1})
}{2^{t/2}} \ket{u}\]
<p>If we introduce $\phi = \varphi 2^t$ and looking at the first $t$ qubits:</p>
\[(2) \quad
\frac{
(\ket{0} + e^{2 \pi i 2^{-1} \phi} \ket{1})
(\ket{0} + e^{2 \pi i 2^{-2} \phi} \ket{1}) \cdots
(\ket{0} + e^{2 \pi i 2^{-t} \phi} \ket{1}) \ket{1})
}{2^{t/2}}\]
<p>We’ll now understand the motivation for this circuit.</p>
<h2 id="inverse-fourier-transform">Inverse Fourier Transform</h2>
<p>If we look back at the construction of equation (8) in the <a href="https://www.kuniga.me/blog/2020/11/21/quantum-fourier-transform.html">quantum Fourier transform</a> (QFT), we’ll be able to recognize that (2) is the result of applying the quantum Fourier transform to a state in the computational basis $\ket{\phi}$!</p>
<p>The previous observation assumes that $\phi$ is a $t$ bit integer $\phi = \phi_t 2^0 + \phi_{t-1} 2^1 + … \phi_1 2^{t - 1}$. In reality $\varphi$ is a real number. Since we’re obtaining $\phi = \varphi 2^t$, if we increase $t$, we improve accuracy at the expense of performance since the number of gates is proportional to $t$.</p>
<p>For example, if the true value of $\varphi$’s binary representation was $0.0100101111101$, then with $t = 4$, the value we’d obtain is $\phi = 0100$, for $t = 8$, we’d get $\phi = 01001011$, which allows for a better approximation of $\varphi$.</p>
<p>Because the QFT can be implemented as a quantum circuit, it has a corresponding unitary matrix, which has an inverse. That is to say there’s a quantum circuit which we can apply to (2), that is to the first $t$ qubits of the output, to obtain $\phi$.</p>
<h3 id="interlude-classic-inverse-fourier-transform">Interlude: Classic Inverse Fourier Transform</h3>
<p>Recall from [3] that the output of the Fourier transform (FT) over $x \in \mathbb{C}^N$ is $y \in \mathbb{C}^N$ with:</p>
\[(3) \quad y_k = \frac{1}{\sqrt{N}} \sum_{j = 0}^{N - 1} x_j e^{2 \pi i j k / N} \qquad \forall k = 0, \cdots, N - 1\]
<p>The inverse Fourier transform over $x \in \mathbb{C}^N$ is $y \in \mathbb{C}^N$ with:</p>
\[(4) \quad y_k = \frac{1}{\sqrt{N}} \sum_{j = 0}^{N - 1} x_j e^{-2 \pi i j k / N} \qquad \forall k = 0, \cdots, N - 1\]
<p>Which is almost the same as the normal FT but with the negative sign on the exponent of $e$.</p>
<p>Now let’s apply the inverse Fourier transform over the output of a Fourier transform.</p>
<p>We’ll replace the $x_j$ in (4) with $y_k$ from (3):</p>
\[y_k = \frac{1}{\sqrt{N}} \sum_{j = 0}^{N - 1} \big(\frac{1}{\sqrt{N}} \sum_{l = 0}^{N - 1} x_l e^{2 \pi i l j / N} \big) e^{-2 \pi i j k / N} \qquad \forall k = 0, \cdots, N - 1\]
<p>We can re-arrange some terms to obtain:</p>
\[y_k = \frac{1}{N} \sum_{j = 0}^{N - 1} \sum_{l = 0}^{N - 1} x_l e^{2 \pi i j (l - k) / N} \qquad \forall k = 0, \cdots, N - 1\]
<p>We can swap the order of the sums to get:</p>
\[y_k = \frac{1}{N} \sum_{l = 0}^{N - 1} x_l \sum_{j = 0}^{N - 1} e^{2 \pi i j (l - k) / N} \qquad \forall k = 0, \cdots, N - 1\]
<p>For $k = l$, $e^{2 \pi i j (l - k) / N} = 1$, and the inner sum is $N$. For a $l \neq k$, the term $e^{2 \pi i j (l - k) / N} = (e^{2 \pi i (l - k) / N})^j$. Taking $\alpha = e^{2 \pi i (l - k) / N}$, we have that</p>
\[S = \sum_{j = 0}^{N - 1} e^{2 \pi i j (l - k) / N} = \sum_{j = 0}^{N - 1} \alpha^{j}\]
<p>This is a geometric sum, so we can use the trick of computing $S \alpha$ and subtracking from $S$:</p>
\[S = \frac{1 - \alpha^N}{1 - \alpha}\]
<p>Replacing $\alpha$ back:</p>
\[S = \frac{1 - e^{2 \pi i (l - k)}}{1 - e^{2 \pi i (l - k) / N}}\]
<p>Since both $l$ and $k$ are integers, $(e^{2 \pi i})^{(l - k)} = 1^{l - k} = 1$. Moreover $l - k < N$ and thus $2 \pi i (l - k) / N < 2 \pi i$, which implies $e^{2 \pi i (l - k) / N} \neq 1$, so $S = 0$.</p>
<p>This implies that the only $l$ for which the inner sum is non-zero is $l = k$, so $y_k = x_k$. This shows that applying the Fourier transform followed by the inverse Fourier transform yields the original result.</p>
<h2 id="measuring-phi">Measuring $\phi$</h2>
<p>We can write (2) in this form:</p>
\[\frac{1}{2^{t/2}} \sum_{k=0}^{2^t - 1} e^{2 \pi i \phi k / 2^{t}} \ket{k}\]
<p>Note how this is the reverse of what we did in [3] (See <em>Algebraic Preparation</em>). To simply the notation, assume that $N = 2^t$:</p>
\[(5) \quad \frac{1}{\sqrt{N}} \sum_{k=0}^{N - 1} e^{2 \pi i \phi k / N} \ket{k}\]
<p>Note this is a state where the $k$-th element has amplitude.</p>
\[\frac{1}{\sqrt{N}} e^{2 \pi i \phi k / N}\]
<p>The inverse Fourier transform is given by:</p>
\[\ket{y} = \sum_{k = 0}^{N - 1} y_k \ket{k}\]
<p>where $y_k$ is defined as:</p>
\[y_k = \frac{1}{\sqrt{N}} \sum_{j = 0}^{N- 1} x_j e^{-2 \pi i j k / N} \qquad \forall k = 0, \cdots, N - 1\]
<p>Thus we can apply the inverse FT to (5), where $x_j$ will be the amplitude of the $j$-th component of (5):</p>
\[\sum_{k = 0}^{N - 1} \frac{1}{\sqrt{N}} \sum_{j = 0}^{N - 1} \big(\frac{1}{\sqrt{N}} e^{2 \pi i \phi j / N} \big) e^{-2 \pi i j k / N} \ket{k}\]
<p>Which we can simplify to</p>
\[\frac{1}{N} \sum_{k = 0}^{N - 1} \sum_{j = 0}^{N - 1} e^{2 \pi i j (\phi - k) / N} \ket{k}\]
<p>The amplitude for a given $\ket{k}$ is given by:</p>
\[(6) \quad \frac{1}{N} \sum_{j = 0}^{N-1} e^{2 \pi i j (\phi - k) / N} = \frac{1}{N} \sum_{j = 0}^{N-1} (e^{2 \pi i (\phi - k) / N})^j\]
<p>Consider the largest base state $b$ (an integer from 0 to $N-1$) smaller than $\phi$ (which can be a real value). Now suppose we measure a value $m$ from the state (5). Note that if $\phi$ is an integer less than $2^t$, there’s $b = \phi$ and from <em>Interlude</em> above there’s only one amplitude that equals to 1, so we would measure $\phi$ with 100% probability.</p>
<p>Suppose it’s not. Let $\delta = \phi - b$. Then the probability of measuring $b$, $p(m = b)$, is given by the square of the magnitude of (6). From [4], we can note, as in <em>Interlude</em>, that (6) is a geometric sum:</p>
\[\frac{1}{N} \frac{1 - e^{2 \pi i \delta}}{1 - e^{2 \pi i \delta / N}}\]
<p>Thus:</p>
\[p(m = b) = \frac{1}{N^2} \frac{\abs{1 - e^{2 \pi i \delta}}^2}{\abs{1 - e^{2 \pi i \delta / N}}^2}\]
<p>We have that $\abs{1 - e^{2ix}}^2 = 4 \abs{\sin x}^2$ (see <em>Appendix</em>), so</p>
\[p(m = b) = \frac{1}{N^2} \frac{\abs{\sin (\pi \delta)}^2}{\abs{\sin (\pi \delta / N)}^2}\]
<p>Since $\delta < 1$ and assuming $t > 0$, $\delta / N \le 1/2$, recalling $N = 2^t$, and using $\sin x \le x$ for $x \le \pi/2$ (see <em>Appendix</em>),</p>
\[p(m = b) \ge \frac{1}{N^2} \frac{\abs{\sin (\pi \delta)}^2}{(\pi \delta / N)^2} = \frac{\abs{\sin (\pi \delta)}^2}{(\pi \delta)^2}\]
<p>Finally, using that $2 x \le \sin(\pi x)$ for $x \le 1/2$ (see [7]), we have</p>
\[p(m = b) \ge \frac{\abs{2 \delta}^2}{(\pi \delta / N)^2} = \frac{4}{\pi^2} \approx 40\%\]
<p>The interesting thing is that this does not depend on the number of qubits used for $b$. We can increase the probability by trading off accuracy if for example $b \pm 1$ would still be a good approximation to $\phi$. More generally, we define some error $\xi > 1$ and now want to know the probability that $\abs{m - b} \le \xi$. In [1] the authors prove that:</p>
\[p(\abs{m - b} \le \xi) \ge \frac{1}{2(\xi - 1)}\]
<h2 id="entangled-eigenvector">Entangled Eigenvector</h2>
<p>So far we assumed the eigenvector $\ket{u}$ is in some computational base state. In practice it could be in an entangle state $\alpha_1 \ket{u_1} + \cdots + \alpha_m \ket{u_m}$ with corresponding eigenvalues $\varphi_1, \cdots, \varphi_m$. This would add another factor of uncertainty because a given $\varphi_i$ would have probability of $\abs{\alpha_i}^2$ of being measured.</p>
<h2 id="conclusion">Conclusion</h2>
<p>In this post we learned how to efficiently find the eigenvalue given a unitary matrix and its eigenvector using a quantum circuit. The algorithm is both approximate (but so is any classical computation dealing with real numbers) and probabilistic, but we can improve both by using more qubits at the expense of number of quantum gates and hence complexity.</p>
<h2 id="related-posts">Related Posts</h2>
<ul>
<li><a href="https://www.kuniga.me/blog/2014/11/24/the-pagerank-algorithm.html">The PageRank algorithm</a> also has to do with computing an eigenvalue and eigenvector. It made me wonder if a quantum PageRank algorithm would make sense. Paparo and Martin-Delgado wrote a paper, <a href="https://arxiv.org/abs/1112.2079">Google in a Quantum Network</a>, which I haven’t read, but from skimming the conclusion it seems to be promising based on initial studies for small networks.</li>
</ul>
<h2 id="appendix">Appendix</h2>
<p><strong>Lemma 1.</strong> $\abs{1 - e^{2ix}}^2 = 4 \abs{\sin x}^2$</p>
<p><em>Proof.</em> By Euler’s formula $e^{2ix} = \cos 2x + i \sin 2x$. We can use common trigonometry identities such as $\sin 2x = 2 \sin x \cos x$, $\cos 2x = 1 - 2\sin^2 x$, to say</p>
\[1 - e^{2ix} = 1 - (1 - 2\sin^2 x + i 2 \sin x \cos x) = 2\sin^2 x - i 2 \sin x \cos x)\]
<p>Given a complex number $a + ib$, $\abs{a + ib}^2 = a^2 + b^2$, so</p>
<p>\(\abs{1 - e^{2ix}}^2 = 4 \sin^4 x + 4 \sin^2 x \cos^2 x = 4 \sin^2 x(\sin^2 x + \cos^2 x) = 4\sin^2 x = 4 \abs{\sin x}^2\). <em>QED</em></p>
<p><strong>Lemma 2.</strong> $\sin x \le x$ for $x \ge 0$</p>
<p><em>Proof.</em> We start with $x = 0$, for which $\sin x = x$. Now we look at the rate of change of both functions: $\frac{d (\sin x)}{dx} = \cos x$ and $\frac{dx}{dx} = 1$. Since $\cos (x) \le 1$, the rate of change of $f(x) = \sin x$ is never greater than that of $f(x) = x$. Both functions are equal at $x = 0$, so for larger values of $x$, $\sin x$ will never become greater than $x$.</p>
<p><strong>Lemma 2.</strong> $\sin x \le x$ for $x \le \pi/2$</p>
<p><em>Pseudo-proof.</em> We start with $x = 0$, for which $\sin x = x$. Now consider $x > 0$. The rate of change of both functions are $\frac{d (\sin x)}{dx} = \cos x$ and $\frac{dx}{dx} = 1$, respectively. Since $\cos (x) \le 1$, the rate of change of $f(x) = \sin x$ is never greater than that of $f(x) = x$. Both functions are equal at $x = 0$, so for larger values of $x$, $\sin x$ will never become greater than $x$. We can use a similar argument for $x < 0$.</p>
<p>Unfortunately this “proof” is not sound because the derivative of $\sin x$ relies on $\lim_{x \rightarrow 0} \frac{sin x}{x} = 1$ which assumes $\sin x < x$, which is a circular argument [5]. Freeman [6] proposes a much simpler geometric proof.</p>
<h2 id="references">References</h2>
<ul>
<li>[<a href="https://www.amazon.com/Quantum-Computation-Information-10th-Anniversary/dp/1107002176">1</a>] Quantum Computation and Quantum Information - Nielsen, M. and Chuang, I.</li>
<li><a href="https://www.kuniga.me/blog/2020/10/11/deutsch-jozsa-algorithm.html">[2]</a> NP-Incompleteness: The Deutsch-Jozsa Algorithm</li>
<li><a href="https://www.kuniga.me/blog/2020/11/21/quantum-fourier-transform.html">[3]</a> NP-Incompleteness:Quantum Fourier Transform</li>
<li>[<a href="https://en.wikipedia.org/wiki/Quantum_phase_estimation_algorithm#Phase_approximation_representation">4</a>] Wikipedia: Quantum phase estimation algorithm</li>
<li>[<a href="https://math.stackexchange.com/questions/125298/how-to-strictly-prove-sin-xx-for-0x-frac-pi2">5</a>] how to strictly prove sin𝑥<𝑥 for $0 < x < \pi 2$</li>
<li>[<a href="http://mathrefresher.blogspot.com/2006/08/sin-x-x-tan-x-for-x-in-02.html">6</a>] Math Refresher: $\sin x < x < \tan x$ for $x \in (0, \pi/2)$</li>
<li><a href="https://math.stackexchange.com/questions/596634/mean-value-theorem-frac2-pi-frac-sin-xx1">[7]</a> Math StackExchange - Mean Value Theorem: $2 \pi < \frac{\sin x}{x} < 1$</li>
</ul>Guilherme KunigamiGiven a unitary matrix $U$ with eigenvector $\ket{u}$, we want to estimate $\varphi$ where $e^{2 \pi i \varphi}$ is the eigenvalue of $U$. This serves as framework for solving a varierity of problems including order finding, which as we have shown in a recent post, can be used to efficiently factorize a number. We assume basic familiarity with quantum computing, covered in a previous post, plus we’ll use quantum Fourier transform (QFT) in one of the steps.Number Factorization from Order-Finding2020-12-11T00:00:00+00:002020-12-11T00:00:00+00:00https://www.kuniga.me/blog/2020/12/11/factorization-from-order<p>Given integers $x$, $N$, the problem of <em>order-finding</em> consists in finding the smallest positive number $r$ such that $x^r \equiv 1 \Mod{N}$, where $r$ is called the <em>order of</em> $x \Mod{N}$.</p>
<p>In this post we’ll show that if we know how to solve the order of $x \Mod{N}$, we can use it to get a probabilistic algorithm for finding a non-trivial factor of a number $N$.</p>
<p>The motivation is that this is a crucial step in Shor’s quantum factorization, but only relies on classic number theory.</p>
<!--more-->
<h2 id="definitions">Definitions</h2>
<p>In this section we define a bunch of terminology, most of which the reader might already be familiar with. Feel free to skip ahead and refer to this when seeing them later.</p>
<p><strong>Prime factorization.</strong> Given a positive number $N$, the prime factorization of $N$ is a set of distinct prime factors $p_1, p_2, \cdots, p_m$ and positive exponents $\alpha_1, \alpha_2, \cdots, \alpha_m$ such that $N = p_1^{\alpha_1} p_2^{\alpha_2} \cdots p_m^{\alpha_m}$. For convenience we assume $p_1 < p_2 < \cdots < p_m$. It’s possible to show that any integer larger than 1 can be uniquely represented by its prime factorization. Example: $600$ can be uniquely represented by $2^3 3^1 5^2$.</p>
<p><strong>Divisibility.</strong> We say a positive integer $x$ divides $y$, denoted as $x \mid y$ if there’s a positive integer $k$ such that $y = kx$. Otherwise, we denote it as $x \nmid y$ and there’s a positive integer $c < x$ such that $y = kx + c$.</p>
<p><strong>Greatest Common Divisor.</strong> The greatest common divisor of two integers $x$ and $y$ is the largest integer that divides both $x$ and $y$ and is denoted by $\gcd(x, y)$. It can be computed in $O(\min(\log(x), \log(y)))$.</p>
<p><strong>Co-primality.</strong> Given two positive integers $x$ and $y$, we say $x$ and $y$ are co-prime if they don’t share any prime factors, or $\gcd(x, y) = 1$. For example, 9 and 10 are co-prime, but 10 and 12 are not since they share the prime factor 2.</p>
<p><strong>Set of co-primes.</strong> Given an integer $N$, we define $Z_N = \curly{1, \cdots, N}$ and $Z_N^{*}$ as the elements in $Z_N$ that are co-prime with $N$. For example, if $N = 10$, $Z^{*}_N = \curly{1, 3, 7, 9}$.</p>
<p><strong>Euler $\varphi$ function.</strong> is defined as the number of co-primes of $N$ less than $N$ and denoted as $\varphi(N)$. Note that $\abs{Z_N^{*}} = \varphi(N)$.</p>
<h2 id="theory">Theory</h2>
<p>We’ll now state a few Theorems from which we’ll build the prime factoring algorithm. Their proofs are described in the <em>Appendix</em>.</p>
<p><strong>Theorem 1.</strong> Given co-primes $x$, $N$ and $r$ the order of $x \Mod{N}$, then $r \le N$.</p>
<p><strong>Theorem 2.</strong> Let $N$ be a non-prime number and $1 \le x \le N$ a non-trivial solution to $x^2 \equiv 1 \Mod{N}$ (by non-trivial we mean $x \not \equiv \pm 1 \Mod{N}$), then at least one of $\gcd (x - 1, N)$ or $\gcd (x + 1, N)$ is a non-trivial factor of $N$.</p>
<p><strong>Theorem 3.</strong> Let $N$ be a odd non-prime positive integer $N$ with prime factors $N = p_1^{\alpha_1} p_2^{\alpha_2} \cdots p_m^{\alpha_m}$. Let $x$ be an element chosen at random from $Z_N^{*}$ and $r$ the order of $x \Mod{N}$. Then</p>
\[p(r \mbox{ is even and } x^{r/2} \not \equiv -1 \Mod{N}) \ge 1 - \frac{1}{2^m}\]
<h2 id="prime-factoring-algorithm">Prime Factoring Algorithm</h2>
<p><em>Theorem 3</em> seems highly specific but combined with <em>Theorem 2</em>, it allows us to find a factor of $N$. To see how, suppose $r$ is even and $x^{r/2} \not \equiv -1 \Mod{N}$, which can happen with probability at least $1 - \frac{1}{2^m}$. Let $y = x^{r/2}$, so $y \not \equiv -1 \Mod{N}$. We also have $y \not \equiv 1 \Mod{N}$, since otherwise $r/2$ would be the order of $x \Mod{N}$. This means that by Theorem B, $\gcd (y - 1, N)$ or $\gcd (y + 1, N)$ is a non-trivial factor of $N$.</p>
<p>We can now define the algorithm to obtain a non-trivial factor of $N$. Here’s a simple Python implementation:</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="kn">from</span> <span class="nn">math</span> <span class="kn">import</span> <span class="n">gcd</span>
<span class="kn">import</span> <span class="nn">random</span>
<span class="k">def</span> <span class="nf">get_factor</span><span class="p">(</span><span class="n">N</span><span class="p">):</span>
<span class="k">if</span> <span class="n">N</span> <span class="o">%</span> <span class="mi">2</span> <span class="o">==</span> <span class="mi">0</span><span class="p">:</span>
<span class="k">return</span> <span class="mi">2</span>
<span class="n">x</span> <span class="o">=</span> <span class="n">random</span><span class="p">.</span><span class="n">randint</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="n">N</span> <span class="o">-</span> <span class="mi">1</span><span class="p">)</span> <span class="c1"># inclusive
</span>
<span class="n">f</span> <span class="o">=</span> <span class="n">gcd</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">N</span><span class="p">)</span>
<span class="k">if</span> <span class="n">f</span> <span class="o">></span> <span class="mi">1</span><span class="p">:</span>
<span class="k">return</span> <span class="n">f</span>
<span class="n">r</span> <span class="o">=</span> <span class="n">order</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">N</span><span class="p">)</span>
<span class="k">if</span> <span class="n">r</span> <span class="o">%</span> <span class="mi">2</span> <span class="o">!=</span> <span class="mi">0</span> <span class="ow">or</span> <span class="n">mod_exp</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">r</span><span class="o">//</span><span class="mi">2</span><span class="p">,</span> <span class="n">N</span><span class="p">)</span> <span class="o">==</span> <span class="n">N</span> <span class="o">-</span> <span class="mi">1</span><span class="p">:</span>
<span class="k">return</span> <span class="bp">None</span>
<span class="n">y</span> <span class="o">=</span> <span class="n">x</span> <span class="o">**</span> <span class="p">(</span><span class="n">r</span> <span class="o">//</span> <span class="mi">2</span><span class="p">)</span>
<span class="n">f</span> <span class="o">=</span> <span class="n">gcd</span><span class="p">(</span><span class="n">y</span> <span class="o">-</span> <span class="mi">1</span><span class="p">,</span> <span class="n">N</span><span class="p">)</span>
<span class="k">if</span> <span class="n">f</span> <span class="o">></span> <span class="mi">1</span><span class="p">:</span>
<span class="k">return</span> <span class="n">f</span>
<span class="k">return</span> <span class="n">gcd</span><span class="p">(</span><span class="n">y</span> <span class="o">+</span> <span class="mi">1</span><span class="p">,</span> <span class="n">N</span><span class="p">)</span></code></pre></figure>
<p><code class="language-plaintext highlighter-rouge">mod_exp(b, e, N)</code> is computes $b^n \Mod{N}$. <code class="language-plaintext highlighter-rouge">order(x, N)</code> returns $r$ such that $x^r \equiv 1 \Mod{N}$.</p>
<h3 id="refining">Refining</h3>
<p>If $N$’s prime factorization is $N = p_1^{\alpha_1}$ for $\alpha_1 > 1$, then $m = 1$ and the probability lower bound is only $0.5$.</p>
<p>We can detect when that’s the case with the following algorithm: for each exponent starting from $e = 2$, we find the largest value $a$ such that $a^b <= N$ via binary search. We stop looking for exponents when $2^{b} > N$ or when our binary search returns 1.</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="k">def</span> <span class="nf">bin_search</span><span class="p">(</span><span class="n">f</span><span class="p">):</span>
<span class="n">x</span> <span class="o">=</span> <span class="mi">1</span>
<span class="k">while</span> <span class="n">f</span><span class="p">(</span><span class="n">x</span> <span class="o"><<</span> <span class="mi">1</span><span class="p">)</span> <span class="o"><=</span> <span class="mi">0</span><span class="p">:</span>
<span class="n">x</span> <span class="o"><<=</span> <span class="mi">1</span>
<span class="n">p2</span> <span class="o">=</span> <span class="n">x</span>
<span class="k">while</span> <span class="n">p2</span> <span class="o">></span> <span class="mi">1</span><span class="p">:</span>
<span class="n">p2</span> <span class="o">>>=</span> <span class="mi">1</span>
<span class="k">if</span> <span class="n">f</span><span class="p">(</span><span class="n">x</span> <span class="o">+</span> <span class="n">p2</span><span class="p">)</span> <span class="o"><=</span> <span class="mi">0</span><span class="p">:</span>
<span class="n">x</span> <span class="o">+=</span> <span class="n">p2</span>
<span class="k">return</span> <span class="n">x</span>
<span class="k">def</span> <span class="nf">get_single_base</span><span class="p">(</span><span class="n">N</span><span class="p">):</span>
<span class="n">e</span> <span class="o">=</span> <span class="mi">2</span>
<span class="k">while</span> <span class="bp">True</span><span class="p">:</span>
<span class="n">f</span> <span class="o">=</span> <span class="k">lambda</span> <span class="n">b</span><span class="p">:</span> <span class="n">b</span><span class="o">**</span><span class="n">e</span> <span class="o">-</span> <span class="n">N</span>
<span class="n">b</span> <span class="o">=</span> <span class="n">bin_search</span><span class="p">(</span><span class="n">f</span><span class="p">)</span>
<span class="k">if</span> <span class="n">b</span> <span class="o">==</span> <span class="mi">1</span><span class="p">:</span>
<span class="k">break</span>
<span class="k">if</span> <span class="n">f</span><span class="p">(</span><span class="n">b</span><span class="p">)</span> <span class="o">==</span> <span class="mi">0</span><span class="p">:</span>
<span class="k">return</span> <span class="n">b</span>
<span class="n">e</span> <span class="o">+=</span> <span class="mi">1</span>
<span class="k">return</span> <span class="bp">None</span></code></pre></figure>
<p>In the code above <code class="language-plaintext highlighter-rouge">bin_search()</code> finds $a$ by constructing the bits from the most to least significant bits, so its complexity is $O(log N)$. Python implements the power function using repeated squares, so <code class="language-plaintext highlighter-rouge">f()</code> is also $O(log N)$. Finally we’ll stop when $2^{b} > N$, so $b \le log(N)$. This leads to a total complexity of $O(log^3 N)$.</p>
<p>We know how to determine $a$ when $N = a^b$, for any positive integers $a > 1$ and $b > 1$. In that case $a$ is a non-trivial factor. Otherwise we know $N$’s prime factorization has at least 2 distinct prime factors, so $m > 1$ and the probability lower bound is now $0.75$.</p>
<h2 id="complexity">Complexity</h2>
<p>Let’s analyze the complexity of <code class="language-plaintext highlighter-rouge">get_factor()</code>. The <code class="language-plaintext highlighter-rouge">gcd(a, b)</code> can be implemented as $O(\log min(a, b))$. If we check for <code class="language-plaintext highlighter-rouge">get_single_base()</code> discussed above, it will add an $O(log^3 N)$ component.</p>
<p>However, the dominant complexity of the function is <code class="language-plaintext highlighter-rouge">order(x, N)</code>. We know from Theorem 1 that $r \le N$. A brute approach consists in looking for all the possibilities, which leads to an $O(N)$ algorithm (or rather $O(N \log N)$ if we were to account for the arithmetic operations):</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="k">def</span> <span class="nf">order</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">N</span><span class="p">):</span>
<span class="n">m</span> <span class="o">=</span> <span class="n">x</span>
<span class="k">for</span> <span class="n">r</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="n">N</span> <span class="o">+</span> <span class="mi">1</span><span class="p">):</span>
<span class="k">if</span> <span class="n">m</span> <span class="o">%</span> <span class="n">N</span> <span class="o">==</span> <span class="mi">1</span><span class="p">:</span>
<span class="k">return</span> <span class="n">r</span>
<span class="n">m</span> <span class="o">*=</span> <span class="n">x</span></code></pre></figure>
<p>Being able to solve <code class="language-plaintext highlighter-rouge">order(x, N)</code> efficiently is the secret ingredient behind Shor’s factorization but we need to resort to quantum computing. We’ll not discuss it here, but we know have the background and motivation from perspective of classic number theory.</p>
<h2 id="experiments">Experiments</h2>
<p>Setting the running time aside, let’s not forget the algorithm we described is probabilistic. The refined version of <code class="language-plaintext highlighter-rouge">get_factor()</code> provides a lower bound of $0.75$, but how accurate is it in practice?</p>
<p>If we run for the first 5,000 non-prime, odd numbers, we get ~75% accuracy on average. If we exclude number of the form $N = a^b$, we get 77% accuracy, only slightly better.</p>
<p>One thing we can do is to repeat the algorithm $k$ times or until it finds a factor. If the probability of one run is $p$, and assuming each run is <a href="https://en.wikipedia.org/wiki/Independent_and_identically_distributed_random_variables">iid</a> then the resulting probability should be $1 - (1 - p)^k$.</p>
<p>If we run for $k=5$ for example, the accuracy is 99.6%. It’s thus possible to get pretty good accuracies with a small number of repetitions.</p>
<h2 id="conclusion">Conclusion</h2>
<p>I really like number theory and it was fun to study the reduction from the prime factoring to the order finding, so much so that I decided to post about this before actually writing on how to solve the order finding problem using quantum computation.</p>
<p>I’m wondering what is the best classic algorithm for solving order finding, including probabilistic ones. We have probabilistic algorithms for detecting primes that are very efficient.</p>
<p>We recall studying some of the modular arithmetic and its properties in college, probably in the context of criptography classes.</p>
<h2 id="appendix">Appendix</h2>
<p><strong>Theorem 1.</strong> Given co-prime integers $x$, $N$, the order $r$ of $x \Mod{N}$ is $r \le N$.</p>
<p><em>Proof.</em> We know that $x^i \Mod{N} \in \curly{1, \cdots N - 1}$ for some positive integer $i$. It follows from the pigeonhole principle that there must exist $j \le N + 1$ such that $x^i \equiv x^j \Mod{N}$, for $i < j$. To see why, note we have $N$ possible outcomes for $x^i \Mod{N}$ so if we consider the first $N + 1$ values of $i$ there ought to be a repeated value.</p>
<p>Since $j > i$, there is some $r > 1$ such that $j = i + r$, thus</p>
\[x^j = x^r x^i\]
<p>and</p>
\[x^i \equiv x^r x^i \Mod{N}\]
<p>Since \(x^i \Mod{N} > 0\), this implies $x^r \equiv 1 \Mod{N}$. Since $j \le N + 1$ and $i \ge 1$, $r \le N$. <em>QED</em></p>
<p><strong>Theorem 2.</strong> Let $N$ be a non-prime number and $1 \le x \le N$ a non-trivial solution to $x^2 \equiv 1 \Mod{N}$ (by non-trivial we mean $x \not \equiv \pm 1 \Mod{N}$), then at least one of $\gcd (x - 1, N)$ or $\gcd (x + 1, N)$ is a non-trivial factor of $N$.</p>
<p><em>Proof.</em> Assuming $x^2 \equiv 1 \Mod{N}$, then $N \mid x^2 - 1 = (x + 1)(x - 1)$, so $N$ must have a common factor with at least one of $(x + 1)$ or $(x - 1)$.</p>
<p>Since $x \not \equiv \pm 1 \Mod{N}$, then $x \neq 1$ and $x \neq N-1$. Which implies $0 < x - 1$ and $x + 1 < N$, hence the common factor is not $N$. Then at least one of $\gcd (x - 1, N)$ or $\gcd (x + 1, N)$ is a non-trivial factor of $N$. <em>QED</em></p>
<p>The proof of theorem 3 is much more involved, so let’s introduce some helpers.</p>
<p><strong>Chinese Remainder Theorem</strong> Let $b_1, \cdots, b_n$ be a set of integers pairwise co-prime and integers $a_1, \cdots, a_n$, where $0 \le a_i < b_i$. Let $N = b_1 \cdots b_m$.</p>
<p>Then the is exactly one $0 \le x < N$ that safisfies $x \equiv a_i \Mod{b_i}$ for every $i$.</p>
<p><em>Proof.</em> Not included here. Refer to [1], Theorem A4.16 (p629).</p>
<p><strong>Lemma 3.1</strong> Let $N$ be a positive integer $N$ with prime factors $N = p_1^{\alpha_1} p_2^{\alpha_2} \cdots p_m^{\alpha_m}$. Let $x$ be an element chosen at random from $Z_N^{*}$. This is equivalent to picking $x_1, x_2, \cdots, x_m$ at random from $Z_{p_1^{\alpha_1}}^{*}, Z_{p_2^{\alpha_2}}^{*}, \cdots, Z_{p_m^{\alpha_m}}^{*}$, respectively.</p>
<p><em>Proof.</em> We just need to show that there’s a one-to-one mapping between $x$ and ($x_1, x_2, \cdots x_m$).</p>
<p>$\rightarrow$ If we define $x_i$ as the remainder of $x$ divided by $p_i^{\alpha_i}$ for every $i$, then this is a unique map from $x$ to ($x_1, x_2, \cdots x_m$), but we need to prove that if $x \in Z_N^{*}$ then $x_i \in Z_{p_i^{\alpha_i}}^{*}$.</p>
<p>Since $x$ is co-prime with $N$, then $x$ is co-prime with $p_i^{\alpha_i}$ (since it has a subset of factors of $N$). We claim that $x_i$ is also co-prime with $p_i^{\alpha_i}$. Otherwise, since $x = k p_i^{\alpha_i} + x_i$ for some integer $k$, if $x_i$ is not co-prime with $p_i^{\alpha_i}$, then they share at least one factor $p_i$, so $x_i = p_i \alpha$, thus $x = p_i (k p_i^{\alpha_i - 1} + \alpha)$, which implies $x$ is not co-prime with $p_i^{\alpha_i}$, a contradiction.</p>
<p>$\leftarrow$ Assume now we have $x_i \in Z_{p_i^{\alpha_i}}^{*}$. Since $p_1^{\alpha_1}$, $p_2^{\alpha_2}$ and $p_m^{\alpha_m}$ are pairwise co-prime, we can use the <em>Chinese Remainder Theorem</em>
to show there’s exactly one solution $0 \le x < N$ to $x \equiv x_i \Mod{p_i^{\alpha_i}}$ for every $i$.</p>
<p>To show $x \in Z_N^{*}$ it remains to show $x$ and $N$ are co-prime. Suppose it’s not. Then it shares a factor $p_j$ with $N$ for some $j$, but then since it holds that $x \equiv x_j \Mod{p_j^{\alpha_j}}$, then $x = p_j \alpha = k p_j^{\alpha_j} + x_j$, which means $x_j = p_j(\alpha - k p_j^{\alpha_j - 1})$, so $x_j$ is not co-prime with $p_j^{\alpha_j}$, contradicting the hypothesis that $x_j \in Z_{p_j^{\alpha_j}}^{*}$. <em>QED</em></p>
<p><strong>Lemma 3.2</strong> Let $a$ and $N$ be co-primes. Then $a^{\varphi(N)} \equiv 1 \Mod{N}$</p>
<p><em>Proof.</em> Not included here. Refer to [1], Theorem A4.9 (p631).</p>
<p><strong>Lemma 3.3</strong> Let $r$ be the order of $x \Mod{N}$ for co-primes $x$ and $N$. Let $r’$ be such that $x^{r’} \equiv 1 \Mod{N}$. Then $r$ divides $r’$.</p>
<p><em>Proof.</em> If $r = r’$, this is trivially true, so consider the case where $r’ > r$ (by definition $r$ cannot be bigger than $r’$). Let’s now assume $r \nmid r’$, so there’s $k > 0$ (since $r < r’$) and $0 < \alpha < r$ such that $r’ = kr + \alpha$.</p>
<p>Then $x^{r’} \equiv x^{kr} x^{\alpha} \equiv 1 \Mod{N}$. We know $x^{r} \equiv 1 \Mod{N}$ and so is $(x^{r})^k \equiv x^{rk} \equiv 1 \Mod{N}$. But this means $x^{\alpha} \equiv 1 \Mod{N}$ with $\alpha < r$, which is a contradiction that $r$ is minimal. <em>QED</em></p>
<p><strong>Lemma 3.4</strong> Let $r$ be the order of $x \Mod{N}$ for co-primes $x$ and $N$. Then $r$ divides $\varphi(N)$</p>
<p><em>Proof.</em> This follows from <em>Lemma 3.2</em>, which states $x^{\varphi(N)} \equiv 1 \Mod{N}$, which allows us to use <em>Lemma 3.4</em>, with $r’ = \varphi(N)$ to conclude that $r$ divides $\varphi(N)$. <em>QED</em></p>
<p><strong>Definition 3</strong> From now until Theorem 3, we’ll assume that $x \in Z_N^{*}$, $x_i \in Z_{p_i^{\alpha_i}}^{*}$ and that $x \equiv x_i \Mod{p_i^{\alpha_i}}$. Furthermore, we’ll assume $r$ is the order of $x \Mod{N}$, and $r_i$ the order of $x_i \Mod{p_i^{\alpha_i}}$.</p>
<p><strong>Lemma 3.5</strong> $r_i \mid r$ for every $i$</p>
<p><em>Proof.</em> We have $x \equiv x_i \Mod{p_i^{\alpha_i}}$, which holds if we raise both to a power $k$. In particular $x^{r} \equiv x_i^{r} \equiv 1 \Mod{p_i^{\alpha_i}}$. We can use <em>Lemma 3.3</em> for $x_i$, $r_i$ and $r$, to show $r_i$ divides $r$.</p>
<p><strong>Lemma 3.6</strong> Let $d_i$ be the largest exponent such that $2^{d_i}$ divides $r_i$. If $r$ is odd or $x^{r/2} \equiv -1 \mod{N}$ then $d_i$ is the same for any $i$.</p>
<p><em>Proof.</em> Let’s consider the case where $r$ is odd. This implies $r_i$ must be too, and the largest power of two that divides it is $2^0 = 1$, hence $d_i = 0$ for all $i$s.</p>
<p>Let’s consider the case where $r$ is even and $x^{r/2} \equiv -1 \mod{N}$.</p>
<p>This means $x^{r/2} + 1 = k N$ for some integer $k$. Since $p_i^{\alpha_i}$ is a factor of $N$ for any $i$, $x^{r/2} + 1 = k’ p_i^{\alpha_i}$, where $k’ = k (N/p_i^{\alpha_i})$ is an integer. Thus $x^{r/2} \equiv -1 \mod{p_i^{\alpha_i}}$. Similar to a previous argument, since $x \equiv x_i \Mod{p_i^{\alpha_i}}$, we have $x^{r/2} \equiv x_i^{r/2} \equiv - 1 \Mod{p_i^{\alpha_i}}$.</p>
<p>Now suppose that $r_i$ divides $r/2$, so $r/2 = k r_i$, so $x_i^{r/2} \equiv x_i^{r_i k} \equiv - 1 \Mod{p_i^{\alpha_i}}$, but $x_i^{r_i} \equiv (x_i^{r_i})^k \equiv x_i^{r_i k} \equiv 1 \Mod{p_i^{\alpha_i}}$ which is a contradiction, so it must be $r_i \nmid r/2$.</p>
<p>Let $d$ be the largest exponent such that $2^{d}$ divides $r$. From <em>Lemma 3.5</em> we have $r_i \nmid r$, so $r_i \le r$, thus $2^{d_i} \le 2^{d}$. If $2^{d_i} < 2^{d}$ then $2^{d_i} \le 2^{d - 1}$ but since $2^{d - 1}$ divides $r / 2$, it must be $2^{d_i} = 2^{d}$.</p>
<p>We just proved, for all $i$, that $d_i = 0$ if $r$ is odd and $d_i = d$ if $x^{r/2} \equiv -1 \mod{N}$. <em>QED</em></p>
<p><strong>Cyclic Group Theorem</strong> A group $Z_N^{*}$ is called <em>cyclic</em> if there’s $g \in Z_N^{*}$ such that for any element $x \in Z_N^{*}$, $x \equiv g^k \Mod{N}$ for some $k \ge 0$. If $N = p^\alpha$ for some odd prime $p$ and positive integer $\alpha$, then $Z_{p^\alpha}^{*}$ is cyclic.</p>
<p><em>Proof.</em> Not included here. This is also not included in [1].</p>
<p><strong>Lemma 3.7.</strong> Suppose $g$ is a generator for $Z_{p^\alpha}^{*}$ and $r$ the order of $g$ $\Mod{p^\alpha}$. Then $r = \abs{Z_{p^\alpha}^{*}} = \varphi(p^\alpha)$.</p>
<p><em>Proof.</em> We first prove that every $x \in Z_{p^\alpha}^{*}$ can be expressed as $x \equiv g^{i} \Mod{p^\alpha}$ for $0 \le i \le r - 1$. From the <em>Cyclic Group Theorem</em>, there is $k \ge 0$ such $x \equiv g^k \Mod{p^\alpha}$. Let $k’$ be the smallest such $k$. If $k’ > r$, then there is $0 < \delta < k’$ such that $k’ = r + \delta$, and $g^{k’} \equiv g^{r} g^{\delta} \Mod{p^\alpha}$, which implies $g^{k’} \equiv g^{\delta} \Mod{p^\alpha}$ which contradicts the fact $k’$ is minimal, thus $k’ \le r$. We also know that $k’ \neq r$ because $g^0 \equiv g^r \equiv 1 \Mod{p^\alpha}$.</p>
<p>What we conclude here is that there are $\abs{Z_{p^\alpha}^{*}} = \varphi(p^\alpha)$ distinct elements in $Z_{p^\alpha}^{*}$ and they all can be expressed with exponents $0 \le k \le r - 1$, which gives a lower bound $r \ge \varphi(p^\alpha)$.</p>
<p>Now consider the set $S$ of $x \equiv g^{i} \Mod{p^\alpha}$ for $0 \le i \le r - 1$. Let $i$ and $j$ represent the exponents of two elements in $S$. We claim that if $g^{i} \equiv g^{j} \Mod{p^\alpha}$ then $i = j$. Suppose not, that there’s $i < j$ such that $g^{i} \equiv g^{j} \Mod{p^\alpha}$. Then $j = i + \delta$, for $0 < \delta < r$, and since $g^i \equiv g^i g^{\delta} \Mod{N}$, which means $g^\delta \equiv 1 \Mod{p^\alpha}$ which contradicts the definition of $r$. This implies that every element of $g^{i} \Mod{p^\alpha}$ for $0 \le i \le r - 1$ is unique, so the size of $S$ is exactly $r$.</p>
<p>We also note that every element of $S$ is in $Z_{p^\alpha}^{*}$, so $S$ is a subset of it and thus $r = \abs{S} \le \abs{Z_{p^\alpha}^{*}} = \varphi(p^\alpha)$, which is an upper bound for $r$.</p>
<p>Combining the lower bound and upper bound of $r$, we conclude it has to be exactly $ \varphi(p^\alpha)$. <em>QED</em></p>
<p><strong>Lemma 3.8</strong> For a prime $p$ and integer $\alpha$, $\varphi(p^\alpha) = p^{\alpha - 1}(p - 1)$.</p>
<p><em>Proof.</em> We start by noting that $\varphi(p) = p - 1$ since no number smaller than $p$ has a common prime factor with $p$. For $p^\alpha$, the only numbers smaller than it that share a prime factor with it must be multiples of $p$, that is, $p k$ for $k = 1, \cdots, p^{\alpha - 1} - 1$. So the number of co-primes of $p^\alpha$ is $p^\alpha - 1$ minus $p^{\alpha - 1} - 1$, so $\varphi(p^\alpha) = p^{\alpha - 1}(p - 1)$. <em>QED</em></p>
<p><strong>Lemma 3.9</strong> Let $p$ be an odd prime and $2^d$ the largest power of 2 dividing $\varphi(p^\alpha)$. Let $r’$ be the order of a randomly chosen element $x$ from $Z_{p^\alpha}^{*}$. Then the probability that $2^d$ divides $r$ is 1/2.</p>
<p><em>Proof.</em> From Lemma 3.8 we have $\varphi(p^\alpha) = p^{\alpha - 1}(p - 1)$. Since $p$ is odd, $p-1$ and $\varphi(p^\alpha)$ are even and thus $d \ge 1$.</p>
<p>From the <em>Cyclic Group Theorem</em>, there is $g \in Z_{p^\alpha}^{*}$ such that a randomly chosen element $x$ satisfies $x \equiv g^{k} \Mod{p^{\alpha}}$. Let’s consider 2 cases:</p>
<p>Case 1: $k$ is odd. We have that $x^r \equiv g^{kr} \equiv 1 \Mod{p^{\alpha - 1}}$. Let $r_g$ be the order of $g^{k} \Mod{p^\alpha}$. From <em>Lemma 3.7</em>, $r_g = \varphi(p^\alpha)$ and then from <em>Lemma 3.3</em> we conclude that $r_g \mid kr$ and thus $\varphi(p^\alpha) \mid kr$. Since $k$ is odd, $r$ and $\varphi(p^\alpha)$ have the same number of 2 factors, hence $2^d$ divides $r$.</p>
<p>Case 2: $k$ is even. From <em>Lemma 3.2</em> $g^{\varphi(p^\alpha)} \equiv 1 \Mod{p^\alpha}$, and since $k/2$ is integer, $g^{\varphi(p^\alpha) k/2} \equiv 1 \Mod{p^\alpha}$, so $x^{\varphi(p^\alpha)/2} \equiv 1 \Mod{p^\alpha}$, and by <em>Lemma 3.3</em> $r \mid \varphi(p^\alpha) / 2$. It must be that $2^d \nmid r$ otherwise $2^d \mid \varphi(p^\alpha) / 2$ and $2^{d+1} \mid \varphi(p^\alpha)$ contradicting the fact that $d$ is maximum.</p>
<p>Summarizing $k$ is odd if and only if $2^d \mid r$. It remains to show that $k$ is odd with 1/2 probability for a random $x$ from $Z_{p^\alpha}^{*}$. We can refer to the proof of <em>Lemma 3.7.</em> that states every $x \in Z_{p^\alpha}^{*}$ can be expressed as $x \equiv g^{i} \Mod{p^\alpha}$ for $0 \le i \le r - 1 = \varphi(p^\alpha) - 1$. Since $\varphi(p^\alpha)$ is even, $\varphi(p^\alpha) - 1$ is odd and if we divide the set of numbers $\curly{0, \cdots, \varphi(p^\alpha) - 1}$ into odds and evens we get two sets of the same size.</p>
<p><strong>Theorem 3.</strong> Let $N$ be an odd non-prime positive integer $N$ with prime factors $N = p_1^{\alpha_1} p_2^{\alpha_2} \cdots p_m^{\alpha_m}$. Let $x$ be an element chosen at random from $Z_N^{*}$ and $r$ the order of $x \Mod{N}$. Then</p>
\[p(r \mbox{ is even and } x^{r/2} \not \equiv -1 \Mod{N}) \ge 1 - \frac{1}{2^m}\]
<p><em>Proof.</em> We’ll prove the equivalent statement:</p>
\[p(r \mbox{ is odd or } x^{r/2} \equiv -1 \Mod{N}) \le \frac{1}{2^m}\]
<p>Let $x \in Z_N^{*}$, $x_i \in Z_{p_i^{\alpha_i}}^{*}$ such that $x \equiv x_i \Mod{p_i^{\alpha_i}}$. By <em>Lemma 3.1</em> we can assume we’re picking $x_i$ instead of $x$.</p>
<p>Let $r_i$ be the order of $x_i \Mod{p_i^{\alpha_i}}$ as in <em>Definition 3</em>. Let $d_i$ be the largest exponent such that $2^{d_i}$ divides $r_i$. By <em>Lemma 3.6</em> if $r$ is odd or $x^{r/2} \equiv -1 \mod{N}$ then $d_i$ is the same for any $i$.</p>
<p>[1] claims it’s enough to use <em>Lemma 3.9</em> to prove it, but it’s not clear to me why. My hunch is to show that if $r \mbox{ is odd or } x^{r/2} \equiv -1 \Mod{N}$ then all of $x_1, x_2, \cdots, x_m$ will be either divisible by $2^d$ (as defined in <em>Lemma 3.9</em>) or not. Since only $\frac{1}{2^m}$ of all the possible values of $x_1, x_2, \cdots, x_m$ can satisfy that, then the condition $r \mbox{ is odd or } x^{r/2} \equiv -1 \Mod{N}$ cannot happen with more than that probability.</p>
<p>I’ll leave this as is for now. If I figure out I can update the post.</p>
<h2 id="references">References</h2>
<ul>
<li>[<a href="https://www.amazon.com/Quantum-Computation-Information-10th-Anniversary/dp/1107002176">1</a>] Quantum Computation and Quantum Information - Nielsen, M. and Chuang, I.</li>
</ul>Guilherme KunigamiGiven integers $x$, $N$, the problem of order-finding consists in finding the smallest positive number $r$ such that $x^r \equiv 1 \Mod{N}$, where $r$ is called the order of $x \Mod{N}$. In this post we’ll show that if we know how to solve the order of $x \Mod{N}$, we can use it to get a probabilistic algorithm for finding a non-trivial factor of a number $N$. The motivation is that this is a crucial step in Shor’s quantum factorization, but only relies on classic number theory.