Contents

The complete 3-nary tree represents the family tree where each individual in a generation spawns (asexually) exactly three progeny in the subsequent generation. The image to the left represents 7 generations beginning from a single ancestor (the root) at the center of the image.

If we imagine the generations continuing ad infinitum, then we arrive at an object called the pro-finite completion of the tree. Loosely speaking this is the topological space which consists of all infinite paths from the root down through the (infinitely many) generations.

The pro-finite completion of the complete 3-nary tree can be put in correspondence with sequences of the form $(m_n)$ where each $m_n \in \{0,1,2\}$ as follows: Each descendent of an individual is labelled either 0, 1 or 2 (this can be done consistently by ordering, say, counterclockwise in the embedding above, but it doesn’t really matter so long as the labels are fixed for all time). A sequence starting, say, $(1,0,2,1,…)$ represents a child of (reading right to left) the first child of the second child of the zeroth child of the first child of the root. Admittedly ‘zeroth child’ sounds awkward, but we think of these as labels and not ordinals.

Visually, we may think of the pro-finite completion to be the boundary of the infinite graph, and the corresponding sequence $(m_n)$ as an address containing the information necessary to describe how to traverse the tree to get to that point on the boundary.

There are other embeddings of the complete 3-nary tree, including the ‘balloon embedding’ on the right. In this embedding the pro-finite completion is visualized as its (fractal) boundary. This embedding gives another construction of the fractal known as Serpienski’s Triangle.

In this embedding you may think of an ‘address’ of a point on the boundary as given by a sequence of ‘Left’, ‘Right’ and ‘Forward’ directions were you to drive to that point from the root along the edges of the graph. A bijection between $\{0,1,2\}$ and {Left, Right, Forward} will produce the sequence $(m_n)$.

Of course there’s nothing special about the 3-nary tree. We could start with any number of descendants per individual per generation. Indeed, we could let the number of descendants vary either between generations, or within a generation. We will see some examples of this soon.

Balloon embeddings of the complete 2-nary (binary), 4-nary and 5-nary trees. In each case the pro-finite completion is the fractal boundary of these graphs, and not the depicted edges and vertices.

## Random Trees

There are lots of ways to make random graphs and trees, but here we will concentrate on a sort of random tree that will arise in the study of prime splitting in towers of number fields. We will suppose we start with a single ancestor (the root), and that each individual in the nth generation has an independent, identically distributed, bounded number of children. Note that the bound on the number of children may grow with generations, but for each generation there is some upper bound on the number of children an individual may have.

A simple example of a random tree where each individual has an equal chance of having 1, 2 or 3 offspring. The images are different embeddings of the same random tree.

Suppose the largest number of children an individual in the $n$the generation may have is $b_n$ (for instance, the random tree above has $b_n=3$ for all $n$). We call the sequence $(b_n)$ the sequence of generation bounds, and we call the tree where each individual in the $n$th generation has exactly $b_n$ children the complete $(b_n)$-nary tree. Every random tree with generation bounds $(b_n)$ can be embedded as a subtree in the complete $(b_n)$-nary tree.

As in the non-random case, the pro-finite completion of a random tree is the address, given as the directions necessary to traverse the tree from the root to a point on the `boundary’. Another way of representing the information given in the address is provided by the list of vertices $(v_n)$ one passes through on the voyage from the root. Here one assumes that the vertices are uniquely labelled. If $(v_n)$ is such a list of vertices we will write $v_m | v_n$ for all $m > n$. Loosely speaking, a vertex $v$ divides the vertex $w$ if $v$ is a vertex further down the tree from $w$. Put even more simply, $v | w$ if $v$ is descended from $w$. We will denote the root of the tree by $v_0$ and note that $v | v_0$ for all vertices $v$.

Let $B$ be the pro-finite completion of our (possibly random) tree as represented by sequences of vertices $(v_n)$, one per generation, with $v_m | v_n$ for all pairs $m > n$. For any vertex $w$ we define $$B(w) = \{ (v_n) \in B : w = v_m \mbox{ for some } m \}.$$ Loosely speaking $B(w)$ is the set of points in the pro-finite completion (boundary of the tree) that are downstream from vertex $w$.

$B(w)$ represents the part of the pro-finite completion that lies in the blue disk (left) or arc (right). Note that these are different representations of the same $B(w)$ on different embeddings of the same random tree.

Note that if $u$ and $w$ are different vertices, then either $B(u)$ and $B(w)$ are disjoint, or one is a subset of the other. It is worth supplying your own proof of this, or at least understanding why it is true from a picture.

## $\sigma$-algebras and measures on $B$

We eventually want to talk about measures (and metrics) on the pro-finite completion of a random tree, but first we need a suitable $\sigma$-algebra. As usual, we actually define a nice collection of sets that we want to be in our $\sigma$-algebra and consider the smallest $\sigma$-algebra that does the trick. Dynkin’s $\pi$-$\lambda$ Theorem seems particularly salient here, and we define $\mathcal P$ to be the $\pi$-system given by all $B(w)$ for all vertices of our tree. That is $$\mathcal P = \{ B(v) : v \mbox{ is a vertex} \} \cup \emptyset.$$ We have to throw in the empty set, because a $\pi$-system is a collection of sets closed under intersection, and by our previous remarks, it is possible (common, in fact) for elements of $\mathcal P$ to be disjoint. We set $\mathcal D$ to be the $\sigma$-algebra on $B$ generated by $\mathcal P$. And we take $(B, \mathcal D)$ to be the measurable space in which all calculations occur.

Notice that, since the intersection of any two elements of $\mathcal P$ is again an element of $\mathcal P$ we see that elements of $\mathcal D$ are simply (possibly countable) disjoint unions of elements of $\mathcal P$. That is, for each set $A \in \mathcal D$ there is a (finite or) countable collection of vertices $V$, and a (finite or) countable $X \subset B$ such that $$A = \bigsqcup_{v \in V} B(v) \sqcup \bigsqcup_{x \in X} \{x \}.$$ The disjointness of this union implies we may do this in such a way that for any $u, v \in V$, $u \not | \;\; v$, and for any $x \in X$ and $v \in V$, $x \not \in B(v)$. We call $V$ a reduced set of vertices for $A$.

The $\pi$-$\lambda$ Theorem implies that a measure $\mu$ on $(B, \mathcal D)$ is determined completely by its values on $\mathcal P$. Note that if $w_1, \ldots, w_d$ are the child-vertices of vertex $w$, then $\mu(w) = \mu(B(w_1)) + \cdots + \mu(B(w_d))$ (and conversely, any collection of $\{m_v \in [0, \infty] : v \mbox{ a vertex}\}$ satisfying all consistency conditions of the form $m_w = m_{w_1} + \cdots + m_{w_d}$ will determine a measure on $(B, \mathcal D)$). There may be special measures on $(B, \mathcal D)$ depending on the construction of your tree, but for now we maintain complete generality, and see how various aspects of the measure interact to potentially give a metric on $B$.

Recall that an atom of the measure $\mu$ is a set $A \in \mathcal D$ such that $\mu(A) > 0$ and if $C \in \mathcal D$ is a proper subset of $A$ then $\mu(C) = 0$. By our construction of elements of $\mathcal D$ we see that if $A$ is an atom of $\mu$ then either $A = \{x \}$ for some $x \in B$, or $A = B(v)$ for some vertex $v.$ In fact, we will see that this latter situation is impossible. To see why, suppose the vertices $v_1, \ldots, v_d$ are the immediate descendants of $v$. Then, $$\mu(B(v)) = \mu(B(v_1)) + \cdots + \mu(B(v_d)).$$ If $d > 1$, It is not possible for $\mu(B(v)) > 0$ and $\mu(B(v_n)) = 0$ for all $n=1,\ldots, d$. Hence, if $v$ has more than one immediate descendent, $B(v)$ cannot be an atom of $\mu$. If, on the other hand $d = 1$ then $B(v) =B(v_1)$ and we can repeat our argument to show that either $B(v_1)$ is not an atom, or it only has one immediate descendant. It follows that if $B(v)$ is an atom then each descendent of $v$ has only one immediate descendent. That is, $B(v)$ contains only one $x \in B$, and hence if $B(v)$ is an atom, then in fact $B(v) = \{x \}$.

$B(v)$ can be a singleton only when all descendants of $v$ have only one immediate descendent. The only $B(v)$ that can be atoms of $\mu$ are singletons.

If $\mu$ has no atoms, then it is said to be diffuse. If $\mu(B(v)) > 0$ for all $B(v)$ that are not singletons, then we say $\mu$ is a full measure.

## A pseudo-ultrametric formed from $\mu$

A metric on $B$ is a function $\delta: B \times B \rightarrow [0, \infty]$ such that for all $x,y,z \in B$,

1. $\delta(x,x) = 0$
2. $\delta(x, y) = 0$ implies $x = y$
3. $\delta(x,y) = \delta(y,x)$
4. $\delta(x,z) \leq \delta(x,y) + \delta(y,z)$

Note that we allow the possibility that $\delta$ is infinite. This is a slight generalization of the usual notion of a metric, but it disturbs very little. If we enforce the stronger requirement $$4′. \quad \delta(x,z) \leq \max\{\delta(x,y), \delta(y,z)\}$$ then we say $\delta$ is an ultrametric. If we instead, lose requirement (2), then we say $\delta$ is a pseudometric. Thus a pseudo-ultrametric $\delta$ satisfies for all $x,y,z \in B$,

• $\delta(x,x) = 0$
• $\delta(x,y) = \delta(y,x)$
• $\delta(x,z) \leq \max\{ \delta(x,y), \delta(y,z) \}$

The third condition is called the ultrametric inequality or the strong triangle inequality.

###### Theorem

Given a measure $\mu$ on $(B, \mathcal D)$, define $\delta : B \times B \rightarrow [0,\infty]$ by $\delta(x,x) = 0$ for all $x \in B$, and for $x \neq y$ , $$\delta(x,y) = \inf\{ \mu(A) : A \in \mathcal P \mbox{ with } x, y \in A\}.$$ Then $\delta$ is a pseudo-ultrametric. Moreover, if $\mu$ is a full measure, then $\delta$ is an ultrametric.

Another way of defining $\delta$ is to first define the least common ancestor vertex of any two $x, y \in B$ by $a(x,y)$ in which case $\delta(x,y) = \mu(B(a(x,y))$. This definition only makes sense when $x \neq y$.

###### Proof

To show that $\delta$ is a pseudo-ultrametric, the only nontrivial condition to check is the ultrametric inequality. That is, for any $x, y, z \in B$, $\delta(x,z) \leq \max\{ \delta(x,y), \delta(y,z) \}$.

There are two cases. The first (slide up) we have $y \not \in B(a(x,z))$. In this case $B(a(x,z)) \subset B(a(y,z))$ and hence $\delta(x,z) \leq \delta(x,y)$. In the second case (slide down) $y \in B(a(x,z))$ and without loss of generality $B(a(x,z)) = B(a(y,z))$.

In the first case we get $B(a(x,z)) \subset B(a(y,z)) = B(a(x,y))$ and hence $\delta(x,z) \leq \max\{ \delta(x,y), \delta(y,z)\}$. (Note that equality is still possible in this case, because it is possible that $\mu(B(a(x,z)) = \mu(B(a(y,z))$ even when $B(a(x,z))$ is a proper subset of $B(a(y,z))$.

The second of these cases is a bit more delicate, but we see that, changing the labels if necessary, $a(x,z) = a(y,z)$. It follows that $\delta(x,z) = \delta(y,z)$ and $\delta(x,y) \leq \delta(y,z)$ which together yield $\delta(x,z) \leq \max\{\delta(x,y), \delta(y,z)\}$ as desired.

Notice that if $\mu$ is full, then $\delta(x,y) = \mu(B(a(x,y)) > 0$ and hence $\delta$ is in fact an ultrametric. $\square$

We say that $x \in B$ is isolated with respect to $\delta$ if there exists $\epsilon > 0$ such that $\delta(x,y) > \epsilon$ for all $y \neq x$. As the next result shows, isolated points come from atoms of $\mu$.

###### Lemma

Suppose $x \in B$ is such that $\{x\}$ is an atom of $\mu$. Then $x$ is isolated with respect to $\delta$. In particular, if $y \neq x$ then $\delta(x,y) \geq \mu\{x\}$.

###### Proof

By definition $x \in B(a(x,y))$. It follows that $\mu\{x\} \leq \mu(B(a(x,y)) = \delta(x,y)$. $\square$

0