The Euclidean Algorithm in Lean

kuniga.me > NP-Incompleteness > The Euclidean Algorithm in Lean

# The Euclidean Algorithm in Lean

16 Apr 2022

Euclid of Alexandria was a Greek mathematician and is considered the father of Geometry. He is thought to have lived in Alexandria, Egypt, during the Ptolemaic reign [1].

His book Elements contains a collection of definitions, theorems and proofs and is known for its mathematical rigour. In the book is also an algorithm for computing the greatest common divisor (GCD) of two natural numbers, now known simply as the Euclidean algorithm.

In this post we’ll use Lean to prove a property of GCD and which is the basis for the Euclidean algorithm. We’ll write a proof using our own version of the natural numbers for educational purposes, extending the work started in Peano’s Axioms.

## Natural Numbers

Let’s start by defining our own version of the natural numbers, Nat, as in [2]. The main addition here is the definition of the symbol 1, which is useful for multiplication.

## Basic Operations

We’ll rely on addition, subtraction, multiplication and also the divisibility operations. Let’s define them now.

### Subtraction

Subtraction is an interesting operation, because natural numbers do not include negative numbers. The subtraction defined below is a truncated subtraction meaning that if $a \le b$, then $a - b = 0$ [7].

We can derive several lemmas from this definition, but since they’re not the focus of our post, let’s treat them as axioms.

### Multiplication

Similar to above, these are all lemmas that can be derived, but we’ll treat them as axioms here.

### Divisibility

Divisibility is a function that determines whether a number $a$ divides a number $b$ or more precisely if there exists $c \in \mathbb{N}$ such that $b = a * c$.

This is exactly the definition of dvd and our first encounter with the universal quantifier ∃:

It’s instructive to prove a simple lemma stating that 1 divides every natural number [3].

When we say 1 | n we call the function dvd 1 n which in turn is ∃ (c : Nat), n = 1 * c. The tactic use n uses n as a possible candidate for c. It’s then trivial to show it’s a valid candidate.

## Greatest Common Divisor

Given natural numbers $a$, $b$ and $g$, we say that $g$ is the greatest common divisor of $a$ and $b$ if it’s the largest number that divides both $a$ and $b$.

Expressing “largest” is complicated, so we’ll use an alternative definition, which says that $g$ is the greatest common divisor of $a$ and $b$ if every other number that divides both $a$ and $b$ also divides $g$ [4].

### GCD For Euclid

We’ll now ready to state our main proposition:

Lemma 1. Given $a, b, q \in \mathbb{N}$ then $gcd(a, b) = gcd(b, a - q * b)$.

In terms of is_gcd, we can state that as is_gcd a b g = is_gcd b (a - q * b) g, which is what we’ll prove. The Lean code follows below as a whole, but we’ll comment on specific parts.

In (1) we first replace both instances of gcd with their definition, which are in the form of 3 ANDed propositions, so we have H1 ∧ H2 ∧ H3 → Q1 ∧ Q2 ∧ Q3, where:

• H1 = (g ∣ b)
• H2 = (g ∣ a - q * b)
• H3 = ∀ x: Nat, (x ∣ a) ∧ (x ∣ a - q * b) → (x ∣ g)
• G1 = (g ∣ a)
• G2 = (g ∣ b)
• G3 = ∀ x: Nat, (x ∣ a) ∧ (x ∣ b) → (x ∣ g)

At (2) we assume the left side of the implication is true and make that our hypothesis H = H1 ∧ H2 ∧ H3. We want to show that if H is true, G1 ∧ G2 ∧ G3 is true too. We further split H into each a hypothesis for each operand. We can do this because if H1 ∧ H2 ∧ H3 is true, then H1, H2 and H3 must be true.

In (3) we break the current goal G1 ∧ G2 ∧ G3 into 3 goals, G1, G2 and G3. If we prove each of them is true separately, we can conclude G1 ∧ G2 ∧ G3 is also true.

To prove Goal 1, i.e. that (g | a) is true, we just need to find a natural number n such that a = g * n. Since (g ∣ b) (from H2), there exists k1 such that b = k1 * g and since g | (a - q * b) there exists k2 such that a - q * b = k2 * g. With some algebra it’s possible to show that a = g * (k1 + k2 * q) so if we use n = k1 + k2 * q in g | a we’ll conclude that it’s a true proposition.

Note how we did things in the reverse order. When writing a proof on paper, we’d usually start from k1 and k2 and try to show that a = g * (k1 + k2 * q), whereas in Lean we started from this equation and showed it’s true.

Goal 2 is trivial to prove since G2 = H1.

Goal 3 is the hardest to prove. First we give a name to x in ∀ (x : Nat), (x ∣ a) ∧ (x ∣ b) → (x ∣ d), say x1. We’ll make no special assumption on x1 so if we prove that (x1 ∣ a) ∧ (x1 ∣ b) → (x1 ∣ d) it should hold for all x : Nat.

In (4) we start by taking the left hand side of the implication as hypothesis and breaking them down into the constituent parts,

• G3H1 = (x1 ∣ a)
• G3H2 = (x1 ∣ b)

Our goal now is to show (x1 ∣ d). (5) We start off by using x1 in H3 which is valid since the hypothesis holds for all x, so H3 is now specialized to (x1 ∣ b) ∧ (x1 ∣ a - q * b) → (x1 ∣ g).

If we have a hypothesis of the form H: P1 → P2 and a goal P2, apply H turns the goal into P1. If we prove P1 is true, P2 is trivially true from H. That’s what we do in (6), and since the new goal is a AND we can split into 2 goals like in (3).

Goal 3.1, (x1 ∣ b), is trivial since it’s exactly G3H2.

Goal 3.2 is (x1 ∣ a - q * b). We can use the same process we used to prove Goal 1, by noting that a - q * b = (k3 - q * k4) * x1, where k3 and k4 are such that a = k3 * x1 and b = k4 * x2.

### Euclidean Algorithm

We can restate Lemma 1 by using the definition of the modulus operator, mod. If r = a mod b, then there is q such that a = p + q * b. Thus p = a - q * b we have:

Corollary 2. Given $a, b \in \mathbb{N}$ then $gcd(a, b) = gcd(b, a \mod b)$.

If $b = 0$, then $a \mod b = a$ and since any number divides 0, we have the base case $gcd(a, b) = gcd(0, a) = a$ if $b = 0$.

This allows us implementing a recursive algorithm to compute the GCD of 2 natural numbers, known as the Euclidean Algorithm. We’ll fallback to using nat instead of our own Nat because otherwise we will have to introduce more definitions like the % operator.

A first attempt is:

But this causes an error:

failed to prove recursive application is decreasing, well founded relation @has_well_founded

When I looked up this error it led me to [5], which uses gcd as an example for this error! The key observation is that Lean doesn’t know that (a % b) < b and that this recursion eventually ends.

The proposed fix in [5] is:

Let’s parse what is happening here. The expression have <Prop>, from <Lemma> is making an assertion based on a Lemma. We can inspect what mod_lt and succ_pos are via #print:

succ_pos says that the successor of any natural number is positive. If we “call” it with b, succ_pos b, we get 0 < (succ b).

mod_lt says that a % b < b as long as b is positive. If we “call” it with a and succ_pos b (i.e. a proof that succ b is positive), we get a % (succ b) < (succ b), which is exactly what we’re stating.

The _ tells Lean to infer which variable to use and expr1 \$ expr2 is equivalent to expr1 (expr2) . We can make this explicit to make it more readable:

Note that we pattern match against succ b as opposed to b because we pass b to succ_pos but succ b to the recursive call. We also inverted the order of a and b in the pattern match, since doing otherwise makes Lean complain [5].

The full code is available on Github.

## Conclusion

I struggled a lot to write the proof for Lemma 1. I knew exactly how to write the proof on paper but translating it into Lean was very difficult. It’s like trying to communicate an idea in a language you’re not fluent in!

One thing that really helped was to write down the proof in pseudo-code and then look up tactics [6] that performed the steps I had written down. In a few cases I had to think backwards to make it work. It was very rewarding to finally figure it out.

The definition of the gcd is very interesting. It’s the first time I run into this scenario where we need to prove an invariant in the function definition, and using theorems for that. I have a vague understanding that there’s a strong relationship between functions and theorems (Curry-Howard Isomorphism), but I’m still wrapping my head around this.

## References

• [1] Wikipedia - Euclid
• [2] NP-Incompleteness: Peano’s Axioms in Lean
• [3] exlean: Divisibility and Primes I
• [4] Lecture Notes for CSE 191, M, Knepley and F, Tsai.
• [5] The equation compiler and using_well_founded
• [6] The Lean Reference Manual, Chapter 6: Tactics
• [7] Library Coq.Arith.Minus