kuniga.me > NP-Incompleteness > Generalized Tries in OCaml

16 Dec 2017

In Chapter 10 of *Purely Functional Data Structures*, Okasaki describes a generalized trie. In this post we’ll see how tries can be seen as maps over a specific type of keys and how we can construct maps over any type of keys.

We discussed tries in OCaml in a previous post. At first glance, it looks like a special data structure but we’ll now see it’s basically a map.

The basic form of a map is a pair of (key, value). In most programming languages the key has to be a scalar (e.g. int, string) or some complex that can be converted to one (think of `hashCode()`

in Java).

For simplicity, let’s assume our scalar is an `int`

and our value is of generic type `T`

. Also, note that the syntax we present below is not valid OCaml (I find OCaml’s type syntax a bit hard to read). A map with int keys can be represented as:

If we want a two dimensional key, for example `(int, int)`

, we can have a map of maps:

and if the dimensions are not fixed, for example, a list of integers, then we can define a recursive structure for our map

If we think of strings as lists of characters, a trie is then a map where the key is a list of characters. From the type above we can simply change the key of our map above to a list of characters and for a basic trie T is a boolean

Note that `option<bool>`

is redundant, so we can use `bool`

only.

**Maps over trees**

We can now generalize the key to more complex recursive types such as trees. Suppose we fave the following type for `tree`

:

The outermost map is indexed by the root of the tree. The inner maps are indexed by the left and right subtrees respectively:

If we expand the key of the second map we get the following:

It gets pretty involved very quickly but because we traverse these types recursively, the implementation is still simple. Let’s see how to implement these in OCaml. The type of a map over an int tree can be defined as follows:

Note that the outermost map is a regular `IntMap`

, which uses the root element of the map, a scalar, as key.

The search function takes a tree representing the key, and the map. The base case is when the tree is empty, when we’re just past a leaf. The recursive case consists in obtaining the maps in order, first using the root element, then using the left tree and finally using the right tree:

Note that the recursive call is non-uniform, so we need explicit annotations, as we discussed previously.

The insertion is similar, expect that when we fail to find a key in any of the maps, we need to first insert it with an empty element before recursing.

Because the `Map.find()`

implementation throws exceptions when a key doesn’t exist, we can wrap the call with a `try`

and if an exception occurs, we can insert the missing key (alternatively we could use `Map.mem()`

).

**Maps over products and sums**

There are two ways a structure can branch out recursively: through *combination* or through *alternative*. For a *combination*, refer to our definition of `Node`

:

`Node of 'a * 'a tree * 'a tree`

In theory, any combination of values for `'a`

, `'a tree`

, `'a tree`

are possible values for a Node. The `*`

in between the components in fact represent the cartesian product operator. This is also known as **product**.

For *alternative*, the type of the tree itself can be either Empty or Node:

In this case, the valid values for a tree is the sum of values of each alternative. Hence, this is also known as **sum**.

We can generalize the map with keys of any types by looking at their definition. If it’s a product, we end up with nested maps. For example, if

then the map over `Tk`

can be defined as

In our example, this came the nested maps `'a trie trie IntMap.t`

.

For sums, if the type is

the map over `Tk`

would end up as a product of maps:

In our example, this came the product `('a option)`

and `('a trie trie IntMap.t)`

. option can be thought as a one-dimensional map.

In this post we saw how a trie can be modeled as a map using the same principles as the ones we use to construct matrices, that is, two-dimensional maps (nested maps). We then generalized the string keys to trees, and implemented the insert/find functions in OCaml. I found it pretty hard to reason about these structures.

We then went a step further and saw how to construct maps based on the key structure. And we learned about product and sum of types when discussing recursive types.