Absolute Values and the Product Formula in Number Fields

Archimedean Absolute Values on $K$

If $K$ is a number field of degree $d$ over $\mathbb Q$, then there are $r_1$ real embeddings $K \hookrightarrow \mathbb R$ and $r_2$ complex embeddings $K \hookrightarrow \mathbb C$ where $r_1 +2 r_2 = d$. Each of these embeddings produces an absolute value on $K$ formed by restricting the usual absolute value on $\mathbb R$ or $\mathbb C$ to the embedded copy of $K$.

Each of these absolute values is incomparable in the sense that given any two of them $| \cdot |_{\sigma}$ and $| \cdot |_{\tau}$ and any $\epsilon > 0$ we can find some $x \in K$ so that $| x |_{\sigma} < \epsilon$ and $| x |_{\tau} > \epsilon^{-1}$. This is a non-trivial fact that follows from the density of $K$ in $\mathbb R^{r_1} \times \mathbb C^{r_2}$ via the diagonal embedding. The $r_1$ real embeddings thus produce $r_1$ distinct real places; similarly there are $r_2$ complex places, and together these form the set of archimedean places of $K$. We denote the set of archimedean places of $K$ by $\mathcal M_{\infty}(K)$ and if $v \in \mathcal M_{\infty}$ we will write $v | \infty$.


If $| \cdot |$ is an archimedean absolute value on $K$, then $|\cdot|$ is in one of the archimedean places of $K$.


Let $\overline K$ be the closure of $K$ with respect to $| \cdot |$, then $\overline K$ is a complete archimedean field and hence is isomorphic to $\mathbb R$ or $\mathbb C$. $K \hookrightarrow \overline K$ is dense, and so this must be isomorphic to one of the known real or complex embeddings. Since the completion is determined by the place and not the absolute value, $| \cdot |$ must be in the place associated to the embedding $K \hookrightarrow \overline K$.

An archimedean absolute value on $K$ restricts to an archimedean absolute value on $\mathbb Q.$ Given an archimedean place $v$, we choose the representative $\| \cdot \|_v$ to be the restriction of the usual absolute value on $K$ as embedded in $K_v$ ($=\mathbb R$ or $\mathbb C$). We then define another equivalent absolute value by setting $| \cdot |_v = \| \cdot \|_v^{1/d}$ if $v$ is real, and $\| \cdot \|_v^{2/d}$ if $v$ is complex. This latter choice of normalization implies that if $r \in \mathbb Q$, $$ |r|_{\infty} = \prod_{v | \infty} | r |_v.$$

Non-archimedean Absolute Values on $K$

By Ostrowski’s Theorem the non-archimedean places of $\mathbb Q$ are indexed by the rational primes (see Absolute Values and Completions of $\mathbb Q$). Thus, if $| \cdot |$ is a non-archimedean absolute value on $K$, $| \cdot |$ restricted to $\mathbb Q$ is equivalent to $| \cdot |_p$ for some prime $p$. If $v$ is the place of $| \cdot |$ we will say $v$ lies above $p$, and write $v|p$. We denote the set of places of $K$ above $p$ by $\mathcal M_p(K)$.

If $f(x)$ is the minimal polynomial for $K|\mathbb Q$, then $K = \mathbb Q[x] / f(x) \mathbb Q[x]$. Since $\mathbb Q \subseteq \mathbb Q_p$, we may view $f(x)$ as a polynomial with coefficients in $\mathbb Q_p$, and it may be the case that $f(x)$ is no longer irreducible over the larger field $\mathbb Q_p$. Suppose $f(x)$ factors as $f_1(x) \cdots f_{\ell}(x)$, where the $f_i(x) \in \mathbb Q_p[x]$ are irreducible.

If $a(x)$ and $b(x)$ are two polynomials in $\mathbb Q[x]$ with $a(x) – b(x)$ not divisible by $f(x)$, then in $\mathbb Q_p[x]$, $a(x) – b(x)$ is not divisible by any of the $f_i(x)$. That is, the map $a(x) + f(x) \mathbb Q[x] \mapsto a(x) + f_i(x) \mathbb Q_p[x]$ is injective for each $i=1,2,\ldots, \ell$. It follows that $K$ embeds in each of the $K_{v_i} := \mathbb Q_p[x] / f_i(x) \mathbb Q_p[x]$.

The diagonal embedding of $K$ into $K_{v_1} \oplus \cdots \oplus K_{v_{\ell}}$ is similar to the archimedean situation where we embed $K$ into $\mathbb R^{r_1} \times \mathbb C^{r_2}$. However, unlike the archimedean situation, the local degrees $[K_{v_i}:\mathbb Q_p]$ may be larger than 2. Like in the archimedean case, under the diagonal embedding, $K$ is dense in $K_{v_1} \oplus \cdots \oplus K_{v_{\ell}}$.

We can create an absolute value on $K_{v_i}$ by using the norm $N_{K_{v_i}|\mathbb Q_p}$ (the multiplicative homomorphism from $K_{v_i}$ onto $\mathbb Q_p$; see Field Extensions and Number Fields) and then hitting the result with the $p$-adic absolute value. That is, we define $$\|\alpha\|_{v_i} = \left| N_{K_{v_i}|\mathbb Q_p}(\alpha)\right|_p.$$ It remains to show this is a non-archimedean absolute value, and as usual, it is the strong triangle inequality that is the tricky part, and before we prove it we need a couple of results.

Hensel’s Lemma

Hensel’s Lemma is the name of a set of related results on how factorization in $\mathbb Z_p[x]$ is related to factorization in $\mathbb F_p[x]$ under the map on coefficients given by $\mathbb Z_p / p \mathbb Z_p \cong \mathbb F_p$. Hensel’s Lemma is usually attributed to the result which allows us to (under certain conditions) “lift” roots of the reduced polynomials in $\mathbb F_p[x]$ to roots of the original polynomial in $\mathbb Z_p[x]$. The classic proof is constructive and relies on a variant of Newton’s method for approximating zeros of function based on derivative information. For the moment, we need a less precise (but still useful) version which allows us to test for factorization of polynomials in $\mathbb Z_p[x]$ by checking the factorization of the reduced polynomial in $\mathbb F_p[x]$.

The form of Hensel’s Lemma we prove here replaces the Newton’s method step with the fact that if $\varphi_1$ and $\varphi_2$ are coprime polynomials in $\mathbb F_p[x]$, then the ideal generated by $\varphi_1$ and $\varphi_2$ is all of $F_p[x]$, and hence given any $\eta \in \mathbb F_p[x]$ we can find $\psi_1, \psi_2 \in \mathbb F_p[x]$ such that $$\psi_1 \varphi_2 + \psi_2 \varphi_1 = \eta.$$ The existence of the solution $(\psi_1, \psi_2)$, which can be found constructively using the Division Algorithm, will be necessary for the inductive construction of a factorization for $f \in \mathbb Z_p[x]$ from the factorization of its reduction in $\mathbb F_p[x]$.

Hensel’s Lemma is important in its own right, especially as it gives us a method to looks for factorizations of polynomials in $\mathbb Z_p[x]$ (and $\mathbb Q_p[x]$) by considering a much easier factorization problem in $\mathbb F_p[x]$.

Hensel’s Lemma

Suppose $f(x) \in \mathbb Z_p[x]$ and write $\widehat{f}$ for the polynomial in $\mathbb F_p[x]$ formed by reducing the coefficients of $f$ modulo $p \mathbb Z_p$. If there exists a factorization $\widehat f(x) = \varphi_1(x) \varphi_2(x)$ in $\mathbb F_p[x],$ with $\varphi_1$ and $\varphi_2$ coprime, then there exists a factorization $f(x) = f_1(x) f_2(x)$ in $\mathbb Z[x]$ with $\varphi_1 = \widehat f_1$, $\varphi_2 = \widehat f_2$ and $\deg f_1 = \deg \varphi_1$.


We will construct sequences of polynomials $f_1^{(n)}$ and $f_2^{(n)}$ in $\mathbb Z_p[x]$ satisfying

  • $\deg f_1^{(n)} = \deg \varphi_1$;
  • ${f}_1^{(n)} = \varphi_1 \bmod p \mathbb Z_p$; ${f}_2^{(n)} = \varphi_2 \bmod p \mathbb Z_p$;
  • $f_1^{(n+1)} \equiv f_1^{(n)} \bmod p^n \mathbb Z_p$; $f_2^{(n+1)} \equiv f_2^{(n)} \bmod p^n \mathbb Z_p$;
  • $f \equiv f_1^{(n)} f_2^{(n)} \bmod p^n \mathbb Z_p$.

Given such sequences $(f_1^{(n)})$ and $(f_2^{(n)})$, $f_1 = \lim f_1^{(n)}$ and $f_2 = \lim f_2^{(n)}$ are defined and satisfy the conditions of the lemma.

Set $f_1^{(1)} = \varphi_1$ and $f_2^{(1)} = \varphi_2$, where we view $a \in \mathbb F_p$ as the constant base-$p$ expansion, $a + 0 \cdot p + 0 \cdot p^2 + \cdots$ in $\mathbb Q_p$.

The inductive hypothesis guarantees the existence of a polynomial $h^{(n)} \in \mathbb Z_p[x]$ such that $$f = f_1^{(n)} f_2^{(n)} + p^n h^{(n)}.$$

We define $f_i^{(n+1)}$ in terms of $f_i^{(n)}$ as $$f_i^{(n+1)} = f_i^{(n)} + p^n g_i^{(n)}, \quad i=1,2;$$ where $g_i^{(n)} \in \mathbb Z_p[x]$ with $\deg g_i^{(n)} < \deg \varphi_i$, and look for $g_i^{(n)}$ so that $(f_i^{(n)})$ satisfies the conclusions of the lemma.

Computing \begin{align*}f_1^{(n+1)} f_2^{(n+1)} &= f_1^{(n)} f_2^{(n)} + p^n(g_1^{(n)} f_2^{(n)} + g_2^{(n)} f_1^{(n)} ) + p^{2n} g_1^{(n)} g_2^{(n)} \\ &= f – p^n(-h^{(n)} + g_1^{(n)} f_2^{(n)} + g_2^{(n)} f_1^{(n)}) + p^{2n} g_1^{(n)} g_2^{(n)}.\end{align*} If we can find $g_1^{(n)}$ and $g_2^{(n)}$ such that $$p \mid \left(-h^{(n)} + g_1^{(n)} f_2^{(n)} + g_2^{(n)} f_1^{(n)}\right),$$ then there is some polynomial in $\mathbb Z_p[x]$, call it $h^{(n+1)}$ such that $$\left(-h^{(n)} + g_1^{(n)} f_2^{(n)} + g_2^{(n)} f_1^{(n)}\right) + p^{2n} g_1^{(n)} g_2^{(n)}= p h^{(n+1)}, $$ and $f = f_1^{(n+1)} f_2^{(n+1)} + p^{n+1} h^{(n+1)}, $ as desired.

To find $g_1^{(n)}$ and $g_2^{(n)}$, Let $\eta = \widehat{h}^{(n)} \in \mathbb F_p[x]$ and consider solutions to the equation $$-\eta + \psi_1 \varphi_2 + \psi_2 \varphi_1 \equiv 0 \bmod p, \qquad \psi_1, \psi_2 \in \mathbb F_p[x].$$ By hypothesis $\varphi_1$ and $\varphi_2$ are relatively prime in $\mathbb F_p[x]$ and therefore there is a non-trivial solution $(\psi_1, \psi_2)$. Let $g_1^{(n)}$ and $g_2^{(n)}$ be any choices of polynomial in $\mathbb Z_p[x]$ with $\widehat{g}_1^{(n)} = \psi_1$ and $\widehat{g}_2^{(n)} = \psi_2$. It follows that $$p \mid \left(-h^{(n)} + g_1^{(n)} f_2^{(n)} + g_2^{(n)} f_1^{(n)}\right),$$ and the lemma is established.

The Ring of Integers in $K_v$

Fix $v = v_i$ and define $\mathfrak O_v = \{ \alpha \in K_v : \| \alpha \|_v \leq 1 \}$.


$\mathfrak O_v$ is a ring. Moreover, it is the integral closure of $\mathbb Z_p$ in $K_v$.

Proof of Theorem

Suppose $\alpha \in \mathfrak O_v$ and $f(x) \in \mathbb Q_p[x]$ is a monic irreducible polynomial such that $f(\alpha) = 0$. We need to show that, in fact $f(x) \in \mathbb Z_p[x]$. Suppose $$f(x) = \sum_{i=0}^n c_i x^i \qquad c_n = 1.$$ We can find $b \in \mathbb Z_p$ such that $b c_i \in \mathbb Z_p$ for all $i=0,\ldots, n$. Indeed we can do this so that not all the $b c_i$ are in $p \mathbb Z_p$. However, notice that if $f$ has a coefficient not in $\mathbb Z_p$ then the leading coefficient, $b c_n$ is in $p \mathbb Z_p$. It follows that when we reduce the coefficients of $b f(x)$ modulo $p \mathbb Z_p$ we get a polynomial of lower degree. That is, $\widehat{f}(x) = \varphi_1(x) \cdot 1$ where $\varphi_1(x) \in \mathbb F_p[x]$ with $\deg \varphi_1 < \deg f$.

The Weak Hensel’s Lemma then implies that $b f(x)$ factors as $f_1(x) f_2(x)$ in $\mathbb Z_p[x]$ where $\deg f_1(x) = \deg \varphi_1 < \deg $. But then this implies that $f(x)$ is not irreducible in $\mathbb Q_p[x]$. This is a contradiction, and hence $f(x) \in \mathbb Z_p[x]$.

Finally, to see that $\mathfrak O_v$ is a ring, we need to show that it is closed under addition (the other ring axioms are immediate). Let $\alpha, \beta$ be nonzero $\mathfrak O_v$. Then if we denote the linear operators given by multiplication by $\alpha$ and $\beta$ on $K_v \cong \mathbb Q_p^{d_v}$ as $T_{\alpha}$ and $T_{\beta}$, then \begin{align*} \| \alpha – \beta \|_v &= | N_{K_v|\mathbb Q_p}(\alpha – \beta) |_p = | \det(T_{\alpha + \beta}) |_p \\ &= | \det T_{\alpha} + \det T_{\beta} |_p = | \det T_{\beta} |_p |\det (T_{\alpha/\beta} – I)|_p. \end{align*} Note that $\gamma = \alpha/\beta$ is in $\mathfrak O_v$ and hence, the characteristic polynomial of $\gamma$, $f_{\gamma}$ has all coefficients in $\mathbb Z_p$. Note that $f_{\gamma}(1) = \det (T_{\gamma} – I)$, and hence $$\| \alpha – \beta \|_v = \| \beta \|_v |f_{\gamma}(1)|_p. $$ But by assumption $\| \beta \|_v \leq 1$ and $|f_{\gamma}(1)|_p \leq 1$, because $f_{\gamma}(1)$ is the sum of the coefficients of $f_{\gamma}$ all of which have $| \cdot |_p \leq 1$. It follows that $\| \alpha – \beta \|_v \leq 1$, and hence $\mathfrak O_v$ is closed under addition.

$\| \cdot \|_v$ is an Absolute Value

We finally can prove that $\| \cdot \|_v$ is a non-archimedean absolute value.


The function $\| \cdot \|_v : K_v \rightarrow [0, \infty)$ given by $\| \alpha \|_v = \left| N_{K_v | \mathbb Q_p}(\alpha)\right|_p$ is a non-archimedean absolute value on $K_v$.


The only non-immediate property is the strong triangle inequality. Suppose $\alpha, \beta \in \mathbb K_v$ and $\| \alpha \|_v \leq \| \beta \|_v$.

Defining $\gamma = \alpha/\beta$ note that $\| \gamma \|_v = \| \alpha \|_v/\|\beta\|_v \leq 1$ and hence $\gamma \in \mathfrak O_v$. Then, $$\det(T_{\beta – \alpha}) = \det(T_{\beta} – T_{\alpha}) = \det T_{\beta} \det(I – T_{\alpha/\beta}) = \det(T_{\beta}) f_{\gamma}(1),$$ where $f_{\gamma}(x) \in \mathbb Z_p[x]$ is the monic characteristic polynomial of $\gamma$. It follows that $\|\beta – \alpha\|_v = \| \beta \|_v |f_{\gamma}(1)|_p \leq \| \beta \|_v$ as desired.

The Places of $K$ above $p$

The $i$th irreducible factor of $f(x) = f_1(x) \cdots f_{\ell}(x) \in \mathbb Z_p[x]$ gives rise to the absolute value $\| \cdot \|_{v_i}$ on $K$ as restricted from $K_{v_i}$. Define the local degree of $K_{v_i}$ to be $d_i = [K_{v_i} : \mathbb Q_p] = \deg f_i.$ If $r \in \mathbb Q$, then multiplication by $r$ on $K_{v_i} \cong \mathbb Q_p^{d_i}$ is given by the constant matrix $r I$ where $I$ is the $d_i \times d_i$ identity matrix. It follows that $N_{K_{v_i} | \mathbb Q_p} = r^{d_i}$ and hence restricted to $\mathbb Q$, $\| \cdot \|_{v_i} = | \cdot |_{p}^{d_i}$. These absolute values represent all the different different places of $K$ that lie above $p$, a set we denote $\mathcal M_p(K)$. If $v \in \mathcal M_p(K)$ we will write $v|p$.

We choose a canonical absolute value for $v \in \mathcal M_p(K)$ by setting $| \cdot |_{v_i} = \| \cdot \|_{v}^{1/d}$. This choice is motivated by the fact that, if $r \in \mathbb Q$, $$|r|_p = \prod_{v | p} | r |_v. $$

In fact, we have the following prelude to the Product Formula.


Suppose $\alpha \in K$, then $$\prod_{v \in \mathcal M_p(K)} |\alpha|_v = \left| N_{K|\mathbb Q}(\alpha)\right|^{1/d}_p.$$


Suppose $\alpha \in K$, then multiplication by $\alpha$ gives a linear transformation on $K \cong \mathbb Q^d$. Write $T_{\alpha}$ for this linear transformation. By definition $N_{K|\mathbb Q}(\alpha)$ is the determinant $T_{\alpha}$. Now $T_{\alpha}$ also makes sense as a linear transformation on $\mathbb Q_p[x]/f(x)\mathbb Q_p[x]$, which, as a vector space is isomorphic to $K \otimes_{\mathbb Q} \mathbb Q_p$, and we write $\overline T_{\alpha}$ for the linear transformation on $K \otimes_{\mathbb Q} \mathbb Q_p$. Note that, $\det \overline T_{\alpha} = \det T_{\alpha}$ because any basis for $K$ will lift to a basis for $K \otimes_{\mathbb Q} \mathbb Q_p$, and the matrices for $T_{\alpha}$ and $\overline T_{\alpha}$ are identical for these bases.

The Fundamental Theorem of Finitely Generated Modules over Principal Ideal Domains, means that as vector spaces $$\mathbb Q_p[x]/f(x)\mathbb Q_p[x] \cong \mathbb Q_p[x]/f_1(x) \mathbb Q_p[x] \oplus \cdots \oplus \mathbb Q_p[x]/f_{\ell}(x) \mathbb Q_p[x],$$ or what amounts to the same thing $$K \otimes_{\mathbb Q} \mathbb Q_p \cong K_{v_1} \otimes \cdots \otimes K_{v_{\ell}}.$$ Moreover, as vector subspaces the $K_{v_1}$ are invariant under $\overline T_{\alpha}$. It follows again from the Fundamental Theorem that $\overline T_{\alpha}$ decomposes as a direct sum $\overline T^{(1)}_{\alpha} \oplus \cdots \oplus \overline T^{(\ell)}_{\alpha}$, and $$\det T_{\alpha} = \det \overline T^{(1)}_{\alpha} \cdots \det \overline T^{(\ell)}_{\alpha}.$$ That is, $$N_{K|\mathbb Q}(\alpha) = N_{K_{v_1}|\mathbb Q_p}(\alpha) \cdots N_{K_{v_\ell}|\mathbb Q_p}(\alpha).$$ And hence $$| N_{K|\mathbb Q}(\alpha)|_p = \prod_{v \in \mathcal M_p(K)} |N_{K_{v}|\mathbb Q_p}(\alpha)|_p = \prod_{v \in \mathcal M_p(K)} \|\alpha\|_p = \prod_{v \in \mathcal M_p(K)} |\alpha|^d_p .$$

We can lift the linear transformation $T_{\alpha}$ to $\overline T_{\alpha}$ this decomposes as $\overline T^{(1)}_{\alpha} \oplus \cdots \oplus \overline T^{(\ell)}_{\alpha}$ because each $K_{v_i}$ is an invariant subspace. From this it follows that $\det T_{\alpha} = \det \overline T_{\alpha} = \det \overline T^{(1)}_{\alpha} \cdots \det \overline T^{(\ell)}_{\alpha}$.

To calculate $|\alpha|_v$ we note that there is some basis for $K \otimes_{\mathbb Q} \mathbb Q_p$ for which the matrix of $T_{\alpha}$ is given by the Frobenius companion matrix of the associated irreducible factor of $f(x)$ when factored over $\mathbb Q_p$. The determinant of this matrix is the constant coefficient, say $c_0$, of that irreducible factor, and $|\alpha|_v = |c_0|_p^{1/d}$.

The Product Formula

The Product Formula For Number Fields

Let $\mathcal M(K)$ denote the set of places of $K$. Suppose $v \in \mathcal M(K)$, let $d_v = [K_v : \mathbb Q_p]$ be the local degree, and and let $| \cdot |_v$ be the absolute value in $v$ which is equal to $| \cdot |_p^{d_v/d}$ on $\mathbb Q$. Then, for all nonzero $\alpha \in K$, $$\prod_{v \in \mathcal M(K)} | \alpha |_v = 1.$$


The proof reduces to the Product Formula for the rational numbers (see Absolute Values and Completions of $\mathbb Q$). Specifically, $$ \prod_v | \alpha |_v = \prod_{p \in \mathcal M(\mathbb Q)} \prod_{v \in \mathcal M_p(K)} |\alpha|_v = \prod_{p \in \mathcal M(\mathbb Q)} |N_{K|\mathbb Q}(\alpha)|^{1/d}_p = 1,$$ where the penultimate equality is the lemma (and the analogous fact for archimedean places of $K$) and the final equality is the invocation of the Product Formula for $N_{K|\mathbb Q}(\alpha) \in \mathbb Q$.

The General Setup

So far, in the non-archimedean situation, we have considered $K | \mathbb Q$ with places $v | p$. However we can replace $\mathbb Q$ with some other base field $k$ and an appropriate place $u$ of $k$ with $v | u$ and have many of our previous results still hold. Usually the necessary changes to the proofs to produce these more general results are minor, and so we will not prove all details other than to wave our hands at any necessary changes from the $k = \mathbb Q$, $u = p$ case.

Let $k$ be a number field and suppose $u$ is a place of $k$ lying above $p \in \mathcal M(\mathbb Q)$. If $\| \cdot \|$ is any absolute value on $k$, then we may form the completion $k_u$ by taking the ring of Cauchy sequences in $k$ (Cauchy with respect to $\| \cdot \|$) and modding out by the maximal ideal formed from sequences converging to 0 (again with respect to $\| \cdot \|$). This is a field, and a generic element looks like an equivalence class of Cauchy sequences in $k$ whose elements differ by a sequence which converges to 0. $k$, represented by constant sequences, is dense in $k_u$ and we may extend $\| \cdot \|_u$ to $k_u$. If $u | \infty$ (that is, $\| \cdot \|_u$ is archimedean), then $k_u$ is equal to $\mathbb R$ or $\mathbb C$ and $\| \cdot \|_u$ is some power of the usual absolute value. We will circle back to that case, but for now we will assume that $p$ is a rational prime, and $u$ is a non-archimedean absolute value.

First we associate to $\| \cdot \|$ to a prime ideal $\mf p$ in the ring of integers $\mf o$ of $k$.

  • If $\alpha \in \mf o$, then $\| \alpha \| \leq 1$.
  • $\mf p := \{ \alpha \in \mf o : \| \cdot \| < 1 \}$ is a prime ideal in $\mf o$.

The strong triangle inequality implies that $\| c \| \leq 1$ for all $c \in \mathbb Z$. If $\alpha$ is in $\mf o$ then there exists a polynomial $x^n + c_{n-1} x^{n-1} + \cdots + c_1 x + c_0$ with integer coefficients that vanishes at $\alpha$. If $\|\alpha\| > 1$ then $$0 = \| \alpha^n + c_{n-1} \alpha^{n-1} + \cdots + c_1 \alpha + c_0 \| = \| \alpha^n \| > 1,$$ where the penultimate relation follows from the case of equality in the strong triangle inequality because $\| \alpha^n \| > \| c_j \alpha^j\| \geq \|\alpha\|^j$. This is an obvious contradiction, and thus $\| \alpha \| \leq 1$.

For the second statement, it is clear that $\mf p$ is an ideal of $\mf o$; the only questionable axiom is additivity, but as always the strong triangle inequality comes to the rescue. Now, if $\alpha \in \mf p$ and $\alpha = \beta \delta$ for some $\beta, \delta \in \mf o$, then clearly either $\| \beta \| < 1$ or $\| \delta \| < 1$ and hence $\mf p$ is a prime ideal.

It will turn out the this association between non-archimedean places and prime ideals is in fact a bijection, and we will often use $\mf p$ to represent the place associated to the prime ideal. This is, more-or-less, the content of Ostrowski’s Theorem for number fields. In this situation $\mf p | p$ has a meaning in terms and a different meaning in terms of places. However, $\mf p | p$ is simultaneously true or simultaneously false for these different interpretations.

One canonical representative of the place indexed by $\mf p$ is the $\mf p$-adic absolute value given by $$\| \cdot \|_{\mf p} = (\mathbb N \mf p)^{-v_{\mf p}(\cdot)}, $$ where $v_{\mf p}(\alpha)$ is the valuation of $\alpha$; the largest integer $n$ with $\alpha \in \mf p^n$. Here $\mathbb N \mf p$ is the norm of $\mf p$, that is $\mathbb N \mf p = [ \mf o : \mf p ]$. We also define the canonical absolute value $| \cdot |_{\mf p} = \| \cdot \|_{\mf p}^{1/d}$. We will see that this notation is consistent with our previous definitions for $\| \cdot \|_v$ and $| \cdot |_v$ when $v = \mf p$.

Completing $k$ and $\mf o$ with respect to $\mf p$ produces the completions $k_{\mf p}$ and $\mf o_{\mf p}$. The units $U_{\mf p}$ in $\mf o_{\mf p}$ consist of all elements of absolute value 1. In particular, prime ideals not equal to $\mf p$ embed as subsets of $U_{\mf p}$ and hence do not maintain their identity as ideals under the embedding. Many authors continue to use $\mf p \subset \mf o_{\mf p}$ for the maximal ideal formed by completing $\mf p$. However, we will distinguish the completion by denoting it $\mathbf m_{\mf p}$. This is the unique maximal ideal in $\mf o_{\mf p}$. This ideal is principal, and we choose $\pi = \pi_{\mf p}$ to be a generator or uniformizer of $\mf m_{\mf p}$.

The quotient $\mf o_{\mf p} / \mf m_{\mf p}$ is a finite field, and we will denote by $q$ and $f_{\mf p}$ the integers $$q := [\mf o_{\mf p} : \mf m_{\mf p}] = p^{f_{\mf p}}.$$ We identify $\mf o_{\mf p} / \mf m_{\mf p}$ with $\mathbb F_q$. Like in $\mathbb Q_p$ there is a series representation for the elements of $K_{\mf p}$. The proof follows mutatis mutandis from the $\mathbb Q_p$ case (see The Algebra and Geometry of $\mathbb Q_p$).


Suppose $\alpha \in \mf k_{\mf p}$ then $\alpha$ there exists integer $n_0$ and $c_{n_0}, c_{n_0+1}, \ldots, \in \mathbb F_q$ such that $\alpha$ can be represented by the sequence of partial sums of $$\sum_{n=n_0}^{\infty} c_n \pi^n. $$ If $\alpha \in \mf o_{\mf p}$ then $n_0$ can be taken to be 0.


Number fields
$f(x), g(x), h(x),$ etc.Polynomials, often in $\mathbb Q[x]$ or $k[x]$
$k, K, L$Number fields
$\alpha, \beta, \gamma,$ etcGeneric field elements. The field depends on context.
$[K : k]$, $d$The degree of a field extension. The fields depend on context.
$T_{\alpha}$The linear transformation on $K|k$ (fields context dependent) given by multiplication by $\alpha$.
$r_1, r_2$The number of real and complex embeddings (respectively) of $K$ (context dependent).
$N_{K|k}$, $\mathrm{Tr}_{K|k}$The Norm and Trace maps $K \rightarrow k$ given by $\alpha \mapsto \det( T_{\alpha})$ and $\alpha \mapsto \mathrm{Tr}( T_{\alpha})$.
$\mf o$, $\mf O$Rings of integers in $k$ and $K$.
$\mf a, \mf b, \mf A, \mf B,$ etc.Ideals in rings of integers. We often use lower case fraktur letters for ideals in $\mf o$ and capital fraktur letters for ideals in $\mf O$.
$\mf p, \mf q, \mf P, \mf Q$ Prime ideals in $\mf o$ and $\mf O$.
$\mathbb N \mf a, etc$ etcThe ideal norm $\mathbb N \mf a = [\mf o : \mf a]$.
Absolute Values
$| \cdot |$, $| \cdot |_p$, $| \cdot |_{\infty}$A generic absolute value, the $p$-adic absolute value on $\mathbb Q_p$ and the usual absolute value on $\mathbb R$ (respectively).
$\mathcal M(K)$, $\mathcal M_p(K)$, $\mathcal M_{\infty}(K)$The places of $K$, the places of $K$ over rational place $p$, the archimedean places of $K$.
$v | p, v | \infty$Shorthand for $v \in \mathcal M_p(K)$ and $v \in \mathcal M_{\infty}(K)$.
$\mf p | p$, $\mf p \in \mathcal M_p(K)$Non-archimedean places indexed by prime ideals
$\mathbb Q_p$, $K_v$The completion of $\mathbb Q$ with respect to the place $p$, and the completion of $K$ with respect to the place $v$.
$\mf o_{\mf p}$, $\mf o_v$, $\mf O_{\mf p}$, $\mf O_v$Local integers. The completion of the integers $\mf o$ or $\mf O$ with respect to $v$ or $\mf p$.
$\mf m_{\mf p}$, $\mf m_v$The maximal ideal in the local integers.
$d_v = [K_v : \mathbb Q_p], d = [K : \mathbb Q]$The local and global degrees of the place $v \in \mathcal M_p(K)$.
$\| \cdot \|_v, \| \cdot \|_{\mf p}$ The absolute value in the place $v = \mf p$ given by $| N_{K_v|\mathbb Q_p} |_p$.
$| \cdot |_v, | \cdot |_{\mf p}$$\mathbb N \mf a,$ etc

Absolute Values and Completions of $\mathbb Q$

An absolute value on a field $K$ is a function $|\cdot| : K \rightarrow [0, \infty)$ such that for any $x, y \in K$,

  • $|x| = 0$ if and only if $x = 0$;
  • $|x y| = |x| |y|$;
  • $|x + y| \leq |x| + |y|$

These properties are called respectively positive definiteness, multiplicativity and the triangle inequality. If the absolute value satisfies the stronger condition (called the strong triangle inequality)

  • $| x + y| \leq \max\{ |x|, |y| \}$

we say it is a non-archimedean absolute value. An absolute value that does not satisfy the strong triangle inequality is called an archimedean absolute value.

The usual absolute values on $\mathbb Q$, $\mathbb R$ and $\mathbb C$ are all archimedean absolute values.

Every field has a trivial absolute value given by $| 0 | = 0$ and $| x | = 1$ for all $x \neq 0$. This absolute value is not very interesting and we will usually concentrate on the non-trivial absolute values of a field.

Equality in the Strong Triangle Inequality

It is often easy to determine when the strong triangle inequality is actually an equality. The proof of the following lemma is easy, but the result is surprisingly powerful.


Suppose $| \cdot |$ is a non-archimedean absolute value $x, y \in K$ are such that $| x | < | y |$ then $| x + y | = | y |$. That is, if $| x | \neq | y |$, $|x + y| = \max\{ |x|, |y| \}.$


Suppose $|x| < |y|$ and $|x + y| < |y|$. Then,$$ |y| = |x + y – x| \leq \max\{|x+y|,|x|\} < |y|;$$ an obvious contradiction.

Absolute Values on $\mathbb Q$

To distinguish the usual absolute value from new ones we may construct we will denote it $| \cdot |_{\infty}$. That is $$ |x|_{\infty} = \left\{ \begin{array}{rl} x & x \geq 0; \\ -x & x < 0. \end{array}\right.$$

If $p$ is a prime integer then define the valuation $v_p : \mathbb Q^{\times} \rightarrow \mathbb Z$ by $$ v_p(x) = v \qquad x = p^v \frac{a}{b}, \quad \mathrm{GCD}(a, b) = 1.$$ That is, we determine the valuation of a rational number by determining the highest power of $p$ that divides it. This power is positive if the numerator has $p$ as a factor, and is negative if the denominator has $p$ as a factor (when written in lowest terms). If the valuation is 0 then the rational number does not have $p$ in its factorization.

The valuation $v_p$ is a homomorphism from $(\mathbb Q^{\times}, \cdot) \rightarrow (\mathbb Z, +)$. That is $v_p$ is additive: $v_p(xy) = v_p(x) + v_p(y)$. It is common to take $v_p(0) = \infty$ with the justification that $0$ is “infinitely divisible” by $p$.

We may write a non-zero rational number $x$ in terms of the various valuations $\{v_p(x) : p \mbox{ prime} \}$ by $$x = \prod_p p^{v_p(x)}.$$ Note that $v_p(x) \neq 0$ for only the primes that appear in the factorization of $x$. It follows that this product is actually a finite product.

We define the $p$-adic absolute value on $\mathbb Q$ by $$| x |_p = p^{-v_p(x)}.$$ Of course, we need to verify that this is an absolute value. Positive definiteness is a matter of definition, multiplicativity comes from the additivity of $v_p$. Only the triangle inequality remains, and we will in fact show that $| \cdot |_p$ satisfies the strong triangle inequality. Suppose $x = p^v a/b$ and $y = p^u c/d$ where $a/b$ and $c/d$ are written in lowest terms. Then $$ x + y = \frac{p^v d a + p^u b c}{bd}.$$ Because $bd$ is relatively prime to $p$, we see that the minimum power of $p$ we can pull out of the numerator is $\min\{u, v\}$. That is $v_p(x + y) \geq \min\{v_p(x), v_p(y) \}$. This is equivalent to the strong triangle inequality.

Note that we actually proved something slightly stronger than the strong triangle inequality here. Written in terms of valuations and absolute values, $$v_p(x + y) > \min\{v_p(x), v_p(y) \} \quad \mbox{only if} \quad v_p(x) = v_p(y)$$ $$|x + y|_p < \max\{|x|_p, |y|_p \} \quad \mbox{only if} \quad |x|_p = |y|_p.$$

Thus, the $p$-adic absolute value is a non-archimedean absolute value.

The Places of $\mathbb Q$

We say two absolute values $| \cdot |_0$ and $| \cdot |_1$ on a field $K$ are equivalent if there is a positive real number $c$ such that $|\cdot |_0 = |\cdot|_1^c$. It is easily verified that this gives an equivalence relation on the set of absolute values of $K$, and we call the equivalence classes the places of $K$. The place corresponding to the trivial absolute value (which is the only representative in its class) is called the trivial place, and is often excluded from attention.

We will eventually talk about how to complete $K$ with respect to an absolute value, using the same methods as when we construct $\mathbb R$ out $\mathbb Q$ with respect to the usual absolute value. We will see that equivalent absolute values produce the same completion, and different places produce different completions. The completion of $\mathbb Q$ with respect to the trivial absolute value is $\mathbb Q$ itself—another hint that nothing interesting happens with trivial absolute values.

The set of non-trivial places of $K$ is denoted $\mathcal M_K$.

Ostrowski’s Theorem

If $| \cdot |$ is a non-trivial absolute value on $\mathbb Q$ then $|\cdot|$ is equivalent to either the usual absolute value $| \cdot |_{\infty}$ or is equivalent to $| \cdot |_p$ for some prime $p$. That is $\mathcal M_{\mathbb Q}$ is in correspondence with $\mathcal P = \{ \mbox{ primes } \} \cup \{ \infty \}$.

For a proof, see https://en.wikipedia.org/wiki/Ostrowski%27s_theorem#Proof

It is hard to understate the importance of this result. It is surprising (though less so after reading the proof) to see a correspondence between the primes and a set of analytic object. It suggests (rightly) that there is progress to be made in understanding the primes by understanding absolute values. Indeed, as we will see this correspondence extends to number fields, and we will see a bijection between non-archimedean absolute values and prime ideals. The archimedean absolute values in that setting correspond to the real and complex embeddings of the number field. It all hangs together very nicely.

The Product Formula

The product formula is the trivial observation that for any rational number $x \neq 0$, $$ \prod_{p \in \mathcal P} | x |_p = 1.$$

This can be verified by calculation: $$ | x |_{\infty} = \prod_{p \mbox{ prime} } p^{v_p(x)} \quad \mbox{and} \quad \prod_{p \mbox{ prime}} | x |_p = \prod_{p \mbox{ prime}} p^{-v_p(x)},$$ and the product formula follows.

In spite of its triviality, it is an important observation and we will return to a version of the product formula for number fields shortly.


Recall these two definitions from elementary analysis.


A sequence of rational numbers $(x_n)$ converges to 0 with respect to the absolute value $| \cdot |$ if, for all $\epsilon > 0$ there exists positive integer $N$ such that if $n \geq N$ then $|x_n| < \epsilon$. In this situation we write $\lim x_n = 0$ or $(x_n) \rightarrow 0$.


A sequence of rational numbers is Cauchy with respect to $| \cdot |$ if, for all $\epsilon > 0$ there exists positive integer $N$ such that if $n, m \geq N$ then $|x_n – x_m| < \epsilon$. We will denote the set of Cauchy sequences by $\mathcal C$.

If you recall from elementary analysis, Cauchy sequences are convergent sequences. However, not every convergent sequence of rational numbers converges to a rational number. That is, taking limits can take you out of $\mathbb Q$. These new limit points live in the completion, and that completion depends on the absolute value (in fact place) used in the definition of convergence and Cauchy.

The equivalence of Cauchy sequences and convergent sequences (once you account for new limit points) is useful because the condition of Cauchyness depends only on the rational numbers in the sequence. That is we can determine if something is Cauchy without having to know what its limit is or where that limit lives.

The other thing that is useful about Cauchy sequences, is that they form a ring under coordinate-wise addition and multiplication. This is essentially equivalent to the limit laws: If $(x_n), (y_n) \in \mathcal C$ then $(x_n + y_n) \in \mathcal C$ and $(x_n y_n) \in \mathcal C$. The identically zero sequence $(0)$ is the additive identity, and $(1)$ is the multiplicative identity. Cauchyness of the coordinate-wise quotient of two sequences also follows from the limit laws, though one has to be careful that the Cauchy sequence in the denominator does not converge to 0, and to “throw out” any quotients in the sequence where the denominator may be 0. We may embed $\mathbb Q \hookrightarrow \mathcal C$ by sending $x$ to the constant sequence $(x)$.

In the real numbers there are many different sequences of rational numbers which converge to the same real number. For instance, much mathematics has been made of discovering interesting rational sequences that converge to some of our favorite irrational numbers, like $\pi$. Notice if we do have two sequence $(x_n)$ and $(y_n)$ and $(x_n) \rightarrow \pi$ and $(y_n) \rightarrow \pi$, and here $\pi$ can be replaced by any real number, then $(x_n – y_n) \rightarrow 0$. Thus we can determine when two convergent sequences converge to the same number, simply by determining whether their difference converges to 0.

Returning to $\mathcal C$, we define an equivalence relation $(x_n) \equiv (y_n)$ if $(x_n – y_n) \rightarrow 0$. In our head we should think of an equivalence class as the set of all Cauchy sequences that converge to the same number, and there is one equivalence class for every possible limit point. And indeed, that is the definition of the completion of $\mathbb Q$ with respect to $| \cdot |$. It is the field formed from the equivalence classes of Cauchy sequences. We add, subtract, multiply and divide by choosing representatives of the equivalence classes and performing the appropriate operation coordinate-wise, and returning the equivalence class of that new Cauchy sequence. The rational number $x$ is represented by the equivalence class of the constant sequence $(x)$.

Let us denote the completion by $\overline{\mathbb Q}$ (this obviously depends on the absolute value, but for the moment we are working the distinguished absolute value $| \cdot |$). Suppose we have another equivalent absolute value $| \cdot |_0 = | \cdot |^c; c > 0$. We wish to argue that both absolute values produce the same completion, that is that the completion is something more appropriately associated to a place rather than a single absolute value. This is done by showing that the absolute values $| \cdot |$ and $| \cdot |_0$ determine the same set of Cauchy sequences, and that they determine the same equivalence relations on those Cauchy sequences.

Suppose $(x_n)$ is Cauchy, and given $\epsilon$ let $N(\epsilon)$ be the guaranteed integer such that $n, m \geq N(\epsilon)$ implies $| x_n – x_m | < \epsilon$. It follows that if we set $N = N(\epsilon^{1/c})$ then $| x_n – x_m |_0 = |x_n – x_m|^c < |\epsilon^{1/c}|^c = \epsilon$. Thus if $(x_n)$ is Cauchy with respect to $| \cdot |$ it is Cauchy with respect to $| \cdot |_0$ the reverse containment is established similarly (but setting $N = N(\epsilon^c)$), and we see both absolute values produce the same Cauchy sequences.

We also need to establish that $(x_n) \rightarrow 0$ with respect to $| \cdot |$ if and only if it does the same with respect to $| \cdot |_0$. The argument is almost identical to that used to show that both absolute values produce the same Cauchy sequences.

The places are in correspondence with $\mathcal P$ and we denote the completion with respect to $| \cdot |_p$ by $\mathbb Q_p$. In particular $\mathbb Q_{\infty}$ is the real numbers.

Extending Absolute Values to $\mathbb Q_p$

Given an element in $\mathbb Q_p$ as represented by Cauchy sequence $x = (x_n)$, then we define $|x|_p = \lim |x_n|_p$ where we take the limit in the real numbers as usual. Upon showing this is well-define, we arrive at an absolute value (which we continue to denote $| \cdot |_p$) on $\mathbb Q_p$ which restricts to $| \cdot |_p$ on $\mathbb Q$.

Suppose $x = (x_n)$ and $y = (y_n)$ are in the same equivalence class—that is $\lim |x_n – y_n|_p = 0$. Then $$| x |_p = \lim | x_n |_p = \lim |y_n + (x_n -y_n)| _p \leq \lim |y_n|_p + \lim |x_n – y_n|_p = |y|_p.$$ But symmetry implies that $|y|_p \leq |x|_p$ so in fact $|x|_p = |y|_p$ and $| \cdot |$ is well defined on $\mathbb Q_p.$

To verify $| \cdot |_p$ is an absolute value on $\mathbb Q_p$, notice that if $| x |_p = 0$ then $(x_n)$ is equivalent to the series $(0)$, that is $x$ is in the zero equivalence class. Multiplicativity follows from the multiplication limit law for series. The triangle inequality likewise follows since $$ | x + y |_p = \lim |x_n + y_n|_p \leq \lim |x_n|_p + \lim|y_n|_p = |x|_p + |y|_p$$ as does the strong inequality (via the continuity of the max function) in the case $| \cdot |_p$ is non-archimedean.

Equipping a field with an absolute value also allows us to define distances, and hence a metric topology. This topology in turn generates a $\sigma$-algebra (which is equal to the Borel $\sigma$-algebra when $p = \infty$) and translation invariant measures (a la Lebesgue measure) on $(\mathbb Q_p, +)$ and $(\mathbb Q^{\times}_p, \cdot)$. I am getting ahead of myself, but the point is that $\mathbb Q_p$ isn’t just a new field with few limit points filled in from $\mathbb Q$, but rather it is a metric space and a measure space and exhibits many features in common with $\mathbb R$.

Different Places Produce Different Completions

Before digging into the topology of $\mathbb Q_p$ we want to justify out claim that the completions of $\mathbb Q$ are in correspondence with $\mathcal P$. So far we have seen that every completion is equal to $\mathbb Q_p$ for some $p \in \mathcal P$, but we have not yet established that different elements in $\mathcal P$ produce different completions.

This is a consequence of the Weak Approximation Theorem for $\mathbb Q_p$, which in our case says that if $| \cdot |$ and $| \cdot |_0$ are non-equivalent absolute values, then you can find an element $x \in \mathbb Q$ such that $| x |$ is large and $| x |_0$ is small (and vice-versa). This will in term imply that different sequences are Cauchy or converge to 0 with respect to these different absolute values.

The Weak Approximation Theorem

Suppose $p_1, \ldots, p_N \in \mathcal P$ index any finite number of places of $\mathbb Q$ and $1 \leq n < N$. Given $a \in \mathbb Q$ and any $\epsilon > 0$ there exists $x \in \mathbb Q$ such that $|x – a|_{p_1}, \ldots, |x – a|_{p_n} \in (0, \epsilon)$ and $|x-a|_{p_{n+1}}, \ldots, |x-a|_{p_N} \in [\epsilon^{-1}, \infty).$

We do this for $a=0$ as follows, first we suppose the $p_1, \ldots, p_N$ are all non-archimedean absolute values and then discuss how to modify the argument if we also want $| x |_{\infty}$ either large or small. Consider $$ x = \frac{p_1^{m_1} \cdots p_n^{m_n}}{p_{n+1}^{m_{n+1}} \cdots p_N^{m_N}}. $$ Then $|x|_{p_1} = p_1^{-m_1}, \ldots, |x|_{p_n} = p_n^{-m_n}$ and these can be made as small as desired by choosing $m_1, \ldots, m_n$ as large as necessary. Similarly, $|x|_{p_{n+1}} = p_{n+1}^{m_{n+1}}, \ldots |x|_{p_N} = p_N^{m_N}$ which can be made as large as desired by choosing $m_{n+1}, \ldots, m_N$ as large as necessary.

How would we simultaneously make $|x|_{\infty}$ as large or small as specified? We may multiply $x$ by a rational number $r$ whose factorization avoids $p_1, p_2, \ldots, p_N$ without changing any of the absolute values $|\cdot|_{p_1}, \ldots, |\cdot|_{p_N}$. That is $|r x|_{p_1} = |x|_{p_1}, \ldots, |r x|_{p_N} = |x|_{p_N}$. Note however, that $|r x|_{\infty} = |x|_{\infty} |r|_{\infty}$, and hence by choosing $r$ sufficiently large or small, the rational number $r x$ will satisfy the conclusions of the theorem when $a=0$. The general case is a consequence of the Chinese Remainder Theorem.

Completions as Topological and Measurable Spaces

Here we are mostly going to assume that $p < \infty$, though we will compare and contrast the situation with the $\mathbb Q_{\infty} = \mathbb R$ case.

The Borel Topology on $\mathbb Q_p$

Once we have an absolute value on a field we can make $\epsilon$-neighborhoods, and these form the basis for a topology. In the case of $x \in \mathbb Q_p$, given $\epsilon > 0$, we define $$B_{\epsilon}(x) = \{ y \in \mathbb Q_p : |y – x|_p < \epsilon \}.$$ When $p = \infty$ these are the usual neighborhoods of $x \in \mathbb R$. The topology generated by all such sets is called the Borel topology. It is easy to see that the collection of open neighborhoods does not depend on which absolute value you use from a particular place. An $\epsilon$-neighborhood with respect to $| \cdot |^c$ is an $\epsilon^{1/c}$-neighborhood of $| \cdot |$ and thus the set of neighborhoods is the same. So from the topological point of view $\mathbb Q_p$ is more naturally associated to a place than to a specific absolute value.

When $p < \infty$ something interesting happens that does not happen in $\mathbb R$. First note that, unlike $| \cdot |_{\infty}$, the non-archimedean absolute values are discrete. Namely $| \cdot |_p$ takes values in $\{p^n : n \in \mathbb Z\}$. This means that any open ball $B_{\epsilon}(x)$ can also be described as a closed ball $\overline{B}_{\epsilon’}(x)$ for some slightly larger $\epsilon’ > \epsilon$. The language sometimes used is that the balls in $\mathbb Q_p$ are clopen.

The balls in $\mathbb Q_p$ are nested in a way that they are not for $\mathbb R$. Namely two balls in $\mathbb Q_p$ are either disjoint, or one is a subset of the other. That is, for $p < \infty$, $\mathbb Q_p$ is totally disconnected. This, like all other differences is driven by the strong triangle inequality. To see this, suppose $B_{\epsilon}(x)$ and $B_{\delta}(y)$ are balls in $\mathbb Q_p$ with $z \in B_{\epsilon}(x) \cap B_{\delta}(y)$. Without loss of generality we may assume $\epsilon \leq \delta$.

First note $x \in B_{\delta}(y)$: $|x – z|_p < \epsilon$ and $|z – y|_p < \delta$. It follows that $|x-y|_p \leq \max\{|x-z|_p, |z-y|_p\} = \delta$.

If $x \not \in B_{\delta}(y)$ then the strong triangle inequality is violated (dotted distance in purple).

Next, if $w \in B_{\epsilon}(x)$ then $|w-y|_p \leq \max\{|w – x|_p, |x – y|_p\} = \delta$ and hence $w \in B_{\delta}(y)$ as claimed. It follows that $B_{\epsilon}(x) \subset B_{\delta}(y)$.

If $w \not \in B_{\delta}(y)$ then the strong triangle inequality is violated (dotted distance in red).

Another property of $\mathbb Q_p$ (this time shared between $p$ finite and infinite) is local compactness. Recall the definition: a space is locally compact if every point $x$ has a neighborhood which is contained in a compact set. It turns out that $\mathbb Q_p$ has the Heine-Borel property, and for any $x$, there is an epsilon such that $x \in B_{\epsilon}(x) \subset \overline B_{\epsilon}$ which does the job.

Haar measure on $\mathbb Q_p$

In the standard manner we set $\mathcal B$ to be the $\sigma$-algebra on $\mathbb Q_p$ generated by all balls. When $p < \infty$ the total disconnectivity of $\mathbb Q_p$ means that the generic sets in $\mathcal B$ are much easier to describe that in the $\mathbb R = \mathbb Q_{\infty}$ situation. Namely, because the countable intersection of balls is either another ball or a singleton (a set containing a single point), we see that a generic set in $\mathcal B$ looks like a countable union of balls and singletons. This is very tidy in comparison the nightmarishness of a general Borel subset of $\mathbb R$.

$(\mathbb Q_p, +)$ and $(\mathbb Q_p^{\times}, \cdot)$ are locally compact abelian groups. Locally compact abelian groups are important kinds of topological and measurable spaces because they can be equipped with a translation invariant measure. Specifically, if $B \in \mathcal B$ is a Borel set and $x \in \mathbb Q_p$ then we call $$ x + B = \{ x + b : b \in B \} \qquad \mbox{and} \qquad x B = \{ x b : b \in B\}$$ the additive and multiplicative translations of $B$ by $x$. A measure $\mu$ on $(\mathbb Q_p, \mathcal B)$ is said to be translation invariant if $\mu(x + B) = \mu(B)$ for all $B \in \mathcal B$ and all $x \in \mathbb Q_p$. In the case of $(\mathbb Q_p^{\times}, \mathcal B^{\times})$ the corresponding condition for the multiplicactive translation invariant measure $\mu^{\times}$ is $\mu^{\times}(x B) = \mu^{\times}(B)$. (I have not formally introduced the $\sigma$-algebra, $\mathcal B^{\times}$ on $\mathbb Q_p^{\times}$ but as as a set of points $\mathbb Q_p^{\times}$ is simply $\mathbb Q_p$ with 0 removed. We may take $\mathcal B’$ to be the $\sigma$-algebra generated by all balls except those containing 0.)

We may make $\mu$ and $\mu^{\times}$ unique by specifying the measure of a single clopen set (or in the case of $\mathbb Q_{\infty} = \mathbb R$ a single (non-singleton) closed interval). Thus we define the measures $\mu_p$ and $\mu^{\times}_p$ to be the unique translation invariant measures on $(\mathbb Q_p, \mathcal B)$ and $(\mathcal Q_p^{\times}, \mathcal B^{\times})$ normalized so that $$ \mu_p \{ x : |x|_p \leq 1 \} = 1 \qquad \mbox{and} \qquad \mu_p^{\times} \{x : |x|_p = 1 \} = \frac{p-1}{p}.$$ These measures are referred to as the (normalized) Haar measures for $\mathbb Q_p$ and $\mathbb Q_p^{\times}$.

The normalization on $\mu_p^{\times}$ may look a little strange. This choice was motivated by the fact that for any $x \in \mathbb Q_p$ and $B \in \mathcal B$, $$\mu_p(x B) = |x|_p \mu_p(B).$$ This implies that $$\mu_p^{\times}(dx) = \frac{\mu_p(dx)}{|x|_p}.$$ Moreover, if we set $C$ to be the closed unit ball, we have $p C = \{ x \in \mathbb Q_p : |x|_p < 1 \}$ and $\mu(pC) = \mu(C)/p$. It follows that $$\mu_p\{ x : |x|_p = 1 \} = \mu_p(C) – \mu_p(pC) = \frac{p-1}{p}.$$ The normalization for $\mu_p^{\times}$ makes it equal to $\mu_p$ on $\{ x : |x|_p = 1 \}$, a situation which can be advantageous when both measures come into play.


Let $B = \{ x : 0< |x|_p < 1\}$ and $\overline B = \{ x : 0< |x|_p \leq 1\}$, and let $U = \overline B \setminus B = \{x : |x|_p = 1\}$. Then $$ \overline B = U \sqcup B \quad \mbox{and} \quad B = p\overline{B}.$$ Induction then implies that $$\overline B = \bigsqcup_{n=0}^{\infty} p^n U; \qquad \mbox{Indeed} \qquad \mathbb Q^{\times}_p = \bigsqcup_{n \in \mathbb Z} p^n U.$$ This is the decomposition of $\mathbb Q^{\times}_p$ into sets of equal absolute value. That is $p^n U$ is exactly the set where on which $|x|_p = p^{-n}$.

Suppose $s > 0$ and consider $$ \int_{\overline B} |x|_p^s \, \mu_p^{\times}(dx) .$$ Using the decomposition, $$\int_{\overline B} |x|_p^s \, \mu_p^{\times}(dx) = \sum_{n=0}^{\infty} \int_{p^n U} |x|_p^s \, \mu_p^{\times}(dx).$$ The integrand is constant (and equal to $p^{-n s}$) on $p^n U$, and $\mu_p^{\times}(p^N U) = \mu_p^{\times}(U) = (p-1)/p$. Hence, $$\int_{\overline B} |x|_p^s \, \mu_p^{\times}(dx) = \sum_{n=0}^{\infty}p^{-ns} \left(\frac{p-1}{p}\right) = \left(\frac{p-1}{p}\right) \frac{1}{1 – p^{-s}}.$$ This is an important calculation in the theory of the Riemann $\zeta$-function.


Field Extensions and Number Fields

Here I am storing various basic facts about Number Fields that are useful in other notes. I hope this becomes more complete as time goes on.

Number Fields

Recall that a number field $K$ is a finite extension of $\mathbb Q$. While we often think of number fields as $\mathbb Q(\alpha)$ for some algebraic number embedded in $\mathbb C$ it is useful to recall the general (unembedded) construction. $\mathbb Q[x]$ is the ring of polynomials with rational coefficients in the indeterminant $x$. If $f(x) \in \mathbb Q[x]$ is irreducible, then $f(x) \mathbb Q[x]$, the ideal formed from all rational polynomials divisible by $f(x)$, is a maximal ideal in $\mathbb Q[x]$. It follows that $K = \mathbb Q[x]/f(x) \mathbb Q[x]$ is a commutative ring with all non-zero elements invertible—that is a field.

In this construction, the elements of $K$ are cosets of the form $g(x) + f(x) \mathbb Q[x]$. If $g(x)$ and $h(x)$ generate the same coset, then we will write $g(x) \equiv h(x)$ (or $g(x) \equiv h(x) \bmod f(x)$ if more clarity is necessary). In this situation $f(x) | (g(x) – h(x))$.

Given the coefficients of $f(x)$, the arithmetic in $K$ is easy to perform. Suppose for $a_0, \ldots, a_{d-1}$ are the rational coefficients to $$f(x) = x^d + \sum_{n=0}^{d-1} a_n x^n,$$ then, $$ x^d \equiv -a_0 – a_1 x – \cdots – a_{d-1} x^{d-1}.$$ Now suppose $g(x) + f(x) \mathbb Q[x]$ is an arbitrary coset. By replacing monomials $x^n$ in $g(x)$ when $n > d$ (serially, if necessary) using this congruence, we see that $g(x) \equiv h(x)$ for some $h(x) \in \mathbb Q[x]$ with $\deg(g) < d$. The polynomial $h(x)$ is equivalent to the result of the Division Algorithm in $Q[x]$ for the remainder of $g(x)$ when divided by $f(x)$.

That is, as a group (in fact, as a vector space) $K$ is isomorphic to $\mathbb Q^d$ where the isomorphism is given by $$(b_0, \ldots, b_{d-1}) \mapsto x^d + \sum_{m=0}^{d-1} b_m x^m + f(x) \mathbb Q[x].$$ The only thing missing in this description is the multiplication. If we want to multiply two vectors $\mathbf b, \mathbf c \in \mathbb Q^d$, we set $g(x)$ to be the monic polynomial with coefficient vector $\mathbf b$ and $h(x)$ to be the polynomial with coefficient vector $\mathbf c$. We first multiply $g(x)$ and $h(x)$ as usual in $\mathbb Q[x]$, and then we use the equivalence $ x^d \equiv -a_0 – a_1 x – \cdots – a_{d-1} x^{d-1}$ to replace monomials in $g(x) h(x)$ (repeatedly if necessary) until we arrive at a polynomial $p(x)$ of degree $< d$. The coefficient vector of this polynomial in $\mathbb Q^d$ is the product of $\mathbf b$ and $\mathbf c$.

$K$, What is it Good for?

First, note that $\mathbb Q \hookrightarrow K$ by the map $r \mapsto r + f(x) \mathbb Q[x]$, and by definition (the fact that $K$ is a vector space of dimension $d$ over $\mathbb Q$) it is a number field of degree $d$ over $\mathbb Q$. This implies $\mathbb Q[x] \hookrightarrow K[x]$, and in particular, $f(x)$ has a life in $K[x]$. Because $f(x)$ is irreducible in $\mathbb Q[x]$ it has no zeroes in $\mathbb Q$. However, we will show that this is no longer the case in $K[x]$. And that is what $K$ is good for—producing a number field where $f(x)$ has a zero.

The element $x + f(x) \mathbb Q[x]$ is the root of $f(x)$ in $K$. To see this, we need only calculate $$f(x + f(x) \mathbb Q[x]) = f(x) + f(x) \mathbb Q[x] = 0 + f(x) \mathbb Q[x].$$

The element $x + f(x) \mathbb Q[x]$ is important as well because if we know how to multiply by this element, then we know how to multiply by arbitrary elements (which are, after all, simply linear combinations of its powers).

Multiplication by $x$ is a linear operator on $\mathbb Q[x]$, indeed $x( a g(x) + h(x) ) = a x g(x) + x h(x)$, and multiplication by $x + f(x) \mathbb Q[x]$ is a linear operator on $K$. We know $K$ is a vector space with basis $( x^n + f(x) \mathbb Q[x] : n=0,\ldots, d-1)$, so it makes sense to talk of the matrix of the multiplication operator, call it $T$, with respect to this basis. Note that, if we denote the standard basis of $\mathbb Q^d$ (with coordinates indexed from 0 to $d-1$ for consistency) by $\mathbf e_0, \ldots, \mathbf e_{d-1}$, then for $n < d-1$, $T \mathbf e_{n} = \mathbf e_{n+1}$. This corresponds to the multiplication $x x^{n} = x^{n+1}$ which remains true in $K$ if $n < d-1$. The final calculation, using the same equivalence that has gotten us so far $ x^d \equiv -a_0 – a_1 x – \cdots – a_{d-1} x^{d-1}$, shows that $T \mathbf e_{d-1} = -a_0 \mathbf e_0 – a_1 \mathbf e_1 – \cdots – a_{d-1} \mathbf e_{d-1}$. It follows that the matrix of $T$ with respect to the basis $(\mathbf e_n)$ is $$ \begin{pmatrix} 0 & 0 & \cdots & 0 & -a_0 \\ 1 & 0 & \cdots & 0 & – a_1 \\ 0 & 1 & \cdots & 0 & – a_2 \\ \vdots & \vdots & \ddots & \vdots & \vdots \\ 0 & 0 & \cdots & 1 & -a_{d-1}\end{pmatrix}.$$

If this matrix looks familiar it is because it is the (Frobenius) companion matrix to $f(x)$ and the characteristic polynomial of this matrix (and hence the operator $T$) is $f(x)$. Indeed, the irreducibility of $f(x)$ implies that the minimal polynomial of $T$ is $f(x)$ as well.

But $f(x)$ has roots in $\mathbb C$. What about them?

The Fundamental Theorem of Algebra (ironically a theorem in analysis) guarantees that $f(x)$ has $d$ roots (counting multiplicity) in $\mathbb C$. How are they related to the root of $f(x)$ is $K$?

Let’s start with our favorite $\alpha \in \mathbb C$ such that $f(\alpha) = 0$. We know that $\alpha$ is either in $\mathbb R$ or it has a complex conjugate—more about that later. We can embed $K$ into $\mathbb C$ by sending $x + f(x) \mathbb Q[x] \rightarrow \alpha$. That is, $$ a_{d-1} x^{d-1} + \cdots + a_1 x + a_0 + f(x) \mathbb Q[x] \quad \mapsto \quad a^{d-1} \alpha^{d-1} + \cdots a_1 \alpha + a_0.$$ We denote this embedding by $\mathbb Q(\alpha) \subset \mathbb C$. Notice that if $\alpha \in \mathbb R$, then $\mathbb Q(\alpha) \subset \mathbb R$ and we call it a real embedding of $K$.

A count of the real and complex embeddings products the first of the classical invariants of a number field.

Invariants: Number of real and complex embeddings $r_1$ and $r_2$

Let us distinguish the real and complex roots of $f(x)$ by setting $\alpha_1, \ldots, \alpha_{r_1}$ to be the real roots and $\beta_1, \overline{\beta_1}, \ldots, \beta_{r_2}, \overline{\beta_{r_2}}$ be the non-real complex roots. Clearly $r_1 + r_2 = d$. Then the embeddings $\mathbb Q(\alpha_1), \ldots, \mathbb Q(\alpha_{r_1}) , \mathbb Q(\beta_1) , \ldots, \mathbb Q(\beta_{r_2})$ are called the archimedean embeddings of $K$.

The Norm and Trace

Here we wish to work in some generality and consider field extension $K | k$ where both are number fields. Little generality is lost by keeping the example $k = \mathbb Q$ at the front of your mind. However, as many properties of number fields ‘factor through’ intermediate fields (for instance $[ K : \mathbb Q] = [K : k] [k : \mathbb Q]$) it is useful to maintain some generality in notation etc.

We will also abandon our attempt to denote elements of $K | k$ as cosets in $k[x] / f(x) k[x]$, writing for instance $\alpha, \beta, \gamma, \ldots $ for generic field elements. Often we will implicitly identify $K$ with $k(\alpha)$ for some algebraic number $\alpha$ of degree $d$ over $k$. In this situation $\{1, \alpha, \ldots, \alpha^{d-1} \}$ is a basis for $K$, and the matrix of multiplication by $\alpha$ with respect to this basis is exactly the matrix of $T$ as before (the Frobenius companion matrix of the minimal polynomial of $K | k$).

More generally, given any $\gamma \in K$, we can make the linear operator “multiplication by $\gamma$” $T_{\gamma}$. If $\gamma$ is given as a $k$-linear combination of $\{1, \alpha, \ldots, \alpha^{d-1}\}$ then it is relatively easy to compute the matrix of $T_{\gamma}$ with respect to this basis. Note that this matrix has entries in $k$.

Norm and trace of $K|k$

The norm $N_{K|k} : K \rightarrow k$ and trace $\mathrm{Tr}_{K|k} : K \rightarrow k$ are respectively the determinant and trace of $T_{\gamma}$.

This definition is independent of basis, but can be computed explicitly in the basis $\{1, \alpha, \ldots, \alpha^{d-1}\}$.

If $\beta, \gamma \in K$, then $T_{\beta \gamma} =T_{\beta} \circ T_{\gamma}$ and $T_{\beta + \gamma} = T_{\beta} + T_{\gamma}$. The multiplicativity of the determinant and the additivity of the trace imply that $$N_{K|k}(\beta \gamma) = N_{K|k}(\beta) N_{K|k}(\gamma)$$ and $$\mathrm{Tr}_{K|k}(\beta + \gamma) = \mathrm{Tr}_{K|k}(\beta) + \mathrm{Tr}_{K|k}(\gamma).$$

The norm is a natural homomorphism from $K^{\times}$ onto $k^{\times}$ and the trace is a natural homomorphism from the additive group $(K, +)$ onto $(k,+)$.


Number fields
$f(x), g(x), h(x),$ etc.Polynomials, often in $\mathbb Q[x]$ or $k[x]$
$k, K, L$Number fields
$\alpha, \beta, \gamma,$ etcGeneric field elements. The field depends on context.
$[K : k]$, $d$The degree of a field extension. The fields depend on context.
$T_{\alpha}$The linear transformation on $K|k$ (fields context dependent) given by multiplication by $\alpha$.
$r_1, r_2$The number of real and complex embeddings (respectively) of $K$ (context dependent).
$N_{K|k}$, $\mathrm{Tr}_{K|k}$The Norm and Trace maps $K \rightarrow k$ given by $\alpha \mapsto \det( T_{\alpha})$ and $\alpha \mapsto \mathrm{Tr}( T_{\alpha})$.
$\mf o$, $\mf O$Rings of integers in $k$ and $K$.
$\mf a, \mf b, \mf A, \mf B,$ etc.Ideals in rings of integers. We often use lower case fraktur letters for ideals in $\mf o$ and capital fraktur letters for ideals in $\mf O$.
$\mf p, \mf q, \mf P, \mf Q$ Prime ideals in $\mf o$ and $\mf O$.
$\mathbb N \mf a, etc$ etcThe ideal norm $\mathbb N \mf a = [\mf o : \mf a]$.

Recalling Galois Theory

This is a brief reminder of the main ideas of Galois theory. Any proofs purported here are meant to be suggestive. I learned Galois theory out of Dummit and Foote, which I thought was pretty good. I also have Classical Galois Theory by Gaal on my shelf. This book is in essence one giant worksheet. I have not completed many of the exercises, but I suspect anyone who did would gain a remarkable intuition as to how the theory hangs together.

At any rate, this is mostly for me, since I seem to need to be reminded of the basics of Galois theory every few years.

It may be useful to review Field Extensions and Number Fields before continuing.

Automorphisms of Field Extensions

We want to work in a bit of generality here, so we assume $K | k$ is an extension of number fields. Little generality is lost at this point if you take $k = \mathbb Q$.


An isomorphism of K is called an automorphism. The set of automorphisms of $K,$ $\mathrm{Aut}(K)$, forms a group under composition. An automorphism σ is said to fix $k$ if $\sigma \gamma=\gamma$ for all $\gamma \in k$. The set of automorphisms of $K$ which fix $k$ is denoted $\mathrm{Aut}(K|k)$ and is a subgroup of $\mathrm{Aut}(K)$.

$\mathrm{Aut}(K|k)$ is a finite group of degree at most $d = [K:k]$. We will use this fact, though the proof would take us too far afield.

Automorphisms of $K$ which preserve $k$ permute roots of polynomials with coefficients in $k$ and roots in $K$.


Suppose $g(x) \in k[x]$ is irreducible and there exists $\beta \in K$ such that $g(\beta) = 0$. Then $g( \sigma \beta) = 0 \quad \mbox{for all} \quad \sigma \in \mathrm{Aut}(K|k)$.


Suppose $g(x) = \sum_m b_m x^m$, then $\sigma b_m = b_m$, and hence $$0= \sigma g(\beta) = \sum_m \sigma b_m (\sigma \beta)^m = \sum_m b_m (\sigma \beta)^m = g(\sigma \beta). \qquad \square$$

One particularly important automorphism is complex conjugation. Suppose that $K | \mathbb Q$ is a number field, and $K \cong \mathbb Q(\alpha)$ for some non-real $\alpha$ (that is the minimal polynomial of $K$ has a non-real root in $\mathbb C$). Then, since complex conjugation is an automorphism of $\mathbb C | \mathbb R$, we have it is also an isomorphism on $K(\alpha) | \mathbb Q$. It follows that if $\beta \in \mathbb Q(\alpha)$ then $\overline \beta \in \mathbb Q(\alpha)$ as well and hence, as sets, $\mathbb Q(\alpha)=\mathbb Q(\overline \alpha)$, and algebraic operations in one of these embeddings can be found from the other by complex conjugation.

Let us distinguish the real and complex roots of $f(x)$ by setting $\alpha_1, \ldots, \alpha_{r_1}$ to be the real roots and $\beta_1, \overline{\beta_1}, \ldots, \beta_{r_2}, \overline{\beta_{r_2}}$ be the non-real complex roots. Clearly $r_1 + r_2 = d$. Then the embeddings $\mathbb Q(\alpha_1), \ldots, \mathbb Q(\alpha_{r_1}) , \mathbb Q(\beta_1) , \ldots, \mathbb Q(\beta_{r_2})$ are called the archimedean embeddings of $K$.

Splitting Fields

Galois theory is concerned about the zeros of rational polynomials and how their zeroes are permuted by the automorphisms of certain extensions of $\mathbb Q$ (which will come to be called Galois extensions). We already noted, that the automorphisms in $\mathrm{Aut}(K|k)$ preserve the set of zeroes in $K$ of any given polynomial $g(x) \in k[x]$. However, the construction $K$ makes no guarantee that a generic polynomial $g(x)$ will have a zero in $K$, and even for the minimal polynomial $f(x)$, the construction of $K$ only guarantees the existence of a single zero of $f(x)$.

In general, if we wanted an extension of $k$ that contains all the zeros of $f(x)$, we would first compute $K = k[x]/ f(x) \mathbb Q[x]$. $K$ contains at least one zero of $f(x)$, and if we factor it in $K[x]$ there will be a linear factor for each of those zeroes. We can then sequentially extend $K$ by constructing field extensions from the remaining irreducible factors of $f(x)$. Each time we extend fields by another irreducible factor, we add another zero of $f(x)$ to the resulting field extension. The process terminates after at most $d$ steps to produce the splitting field of $f(x)$. The splitting field of $f(x)$ has degree bounded by $d!$.

It is possible that the degree of the splitting field is as small as $d$, since it is possible, depending on the nature of $f(x)$, that $K[x]/f(x)\mathbb Q(x)$ itself contains $d$ zeros of $f(x)$.


Suppose $p$ is prime and consider the $p$th cyclotomic polynomial $$\Phi_p(x) = x^{p-1} + x^{p-2} + \cdots + x + 1.$$ Suppose $\zeta$ is a zero of $\Phi_p(x)$ in $\mathbb Q[x]/\Phi_p(x) \mathbb Q[x]$, then it is easily verified that $\zeta^p = 1$.It follows that, if $\ell = 1, \ldots, p-1$, \begin{eqnarray}\Phi_p(\zeta^{\ell}) &=& \zeta^{\ell(p-1)} + \zeta^{\ell(p-2)} + \cdots + \zeta^{\ell} + 1 \\ &=& \zeta^{p-1} + \zeta^{p-2} + \cdots + \zeta + 1 \\ &=& 0.\end{eqnarray} It follows that $\zeta, \zeta^2, \ldots, \zeta^{p-1}$ are all $p-1$ roots of $\Phi_p(x)$ and hence $K = \mathbb Q[x]/\Phi_p(x)\mathbb Q[x]$ is the splitting field of $\Phi_p(x)$.

Galois Theory


If $K | k$ is a splitting field for a polynomial $g(x) \in k[x]$, then $K$ is said to be Galois over $k,$ and the group of automorphisms of $K$ which fix $k$ is called the Galois group and denoted $\mathrm{Gal}(K|k)$.


$K | k$ is Galois if and only if $\# \mathrm{Aut}(K|k) = [K : k]$.

We won’t prove this claim (though the only if direction is easy) because it is a bit fiddly with separability and involves a diversion into character theory. Some (most?) authors give this as the definition of Galois and prove that it implies the splitting field definition.

The main result in Galois Theory is a correspondence between intermediate fields of $K | k$ and subgroups of $\mathrm{Gal}(K|k)$. Let us write $G = \mathrm{Gal}(K|k)$ and suppose $H < G$ is a subgroup. Define $$K_H = \{ \gamma \in K : \sigma(\gamma) = \gamma \mbox{ for all } \sigma \in H \}.$$ It is easily verified that $K_H$ is a field, and $k \subset K_H \subset K$ (which we might abbreviate $K | K_H | k$). It will turn out that $H \leftrightarrow K_H$ will be a bijection (called the Galois correspondence) between subgroups of $G$ and intermediate fields of $K | k$.

This correspondence goes beyond a bijection, because there is an interpretation for $H$ and $G/H$ (as a subgroup in the case where $H$ is normal, but to some extent even as a set of cosets in the non-normal case) in terms of the groups of automorphisms $\mathrm{Gal}(K|K_H)$ and $\mathrm{Aut}(K_H|k)$. I hope you objected to the notational switch between $\mathrm{Gal}$ and $\mathrm{Aut}$ in the previous sentence, but it is correct. The fact that $K$ is a splitting field for a polynomial in $k[x]$ means that it is also the splitting field for a polynomial in $K_H[x]$ (namely any one of the irreducible factors of the original polynomial in $k[x]$) and hence $K | K_H$ is Galois and we use the notation $\mathrm{Gal}(K | K_H)$ for the group of automorphisms of $K$ preserving $K_H$. This is, unsurprisingly, equal to $H$. On the other hand, just because $K$ is the splitting field of a polynomial in $k[x]$ doesn’t imply that an intermediate field, such as $K_H$, must be a splitting field for that or any other polynomial in $k[x]$. Thus, in general we need to refer to the automorphism group of $K_H | k$ by $\mathrm{Aut}(K_H | k)$. It will turn out that when $H$ is normal in $G$ then $K_H | k$ is Galois, and $\mathrm{Gal}(K_H | k) \cong G/H$. This will all be enumerated in the Fundamental Theorem of Galois Theory, but we need to develop a few results first.

Given $\gamma \in K$, we call $\sigma \gamma; \sigma \in G$ the Galois conjugates of $\gamma$. Moreover, if $L$ is any intermediate field extension, $K | L | k$, then $\sigma$ gives an isomorphism from $L$ onto $\sigma(L)$ (which fixes $k$). In particular $K_H$ is isomorphic to its image $\sigma K_H$. Notice then that if $\psi \in \mathrm{Aut}(K_H | k)$, then $\sigma \psi \sigma^{-1}$ is an element of $\mathrm{Aut}(\sigma K_H | k)$.

Indeed, $\sigma H \sigma^{-1} = \mathrm{Aut}(\sigma K_H | k)$. We can make this more evocative by denoting the action by conjugation of $G$ on $H$ as $\sigma \cdot \psi = \sigma \psi \sigma^{-1}$, in which case, $$\sigma \cdot \mathrm{Aut}(K_H | k) = \mathrm{Aut}(\sigma K_H | k).$$ If $\sigma K_H = K_H$ for all $\sigma \in G$, then $GHG^{-1} = H$, that is $H$ is normal in $G$. On the other hand, if $H$ is normal in $G$, then $\sigma \psi \sigma^{-1} \in H$ and $\sigma K_H = K_H$ for all $\sigma \in G$.

Now, suppose $H$ is normal and $g(x) \in k[x]$ is a polynomial so that $K_H = k[x]/g(x)k[x]$. From the previous discussion, $x + g(x) k[x]$ is a zero of $g(x)$ in $K_H$, as are $\sigma (x + g(x) k[x])$ for all $\sigma \in G$. To establish $K_H | k$ is Galois, we need to show that the orbit of $x + g(x) k[x]$ under $G$ is equal to $[K_H : k]$. We know for $\sigma \in H$, $\sigma(x + g(x) k[x]) = x + g(x)k[x]$. On the other hand, if $\sigma(x + g(x) k[x]) = x + g(x)k[x]$ then $\sigma \in H$ because $\sigma$ is completely determined by its action on $x + g(x) k[x]$. Thus, the automorphisms of $\mathrm{Aut}(K_H|k)$ are in correspondence with $G/H$. We thus have $[K : K_H] = \#H$, $[K : k] = \#G$ and $[K_H : k] = \#G/H$. It follows that $\# \mathrm{Aut}(K_H|k) = [K_H : k]$ and $K_H | k$ is thus Galois.

To be sure, we have glossed over many details. However, many important observations are captured in the Fundamental Theorem of Galois Theory.

Fundamental Theorem of Galois Theory

Suppose $K | k$ is Galois and $G = \mathrm{Gal}(K|k)$.


There is an inclusion reversing correspondence between intermediate fields of $K|k$ and subgroups of $H$.

Normality $\leftrightarrow$ Galois

$H$ is normal in $G$ if and only if $L|k$ is Galois. In this situation $\mathrm{Gal}(L|k) \cong G/H$.

The Correspondence Preserves Lattices

Suppose $H_1 \leftrightarrow L_1$ and $H_2 \leftrightarrow L_2$ for $H_1, H_2 \leq G$ and $L_1, L_2$ intermediate fields of $K|k$. Then $\langle H_1, H_2 \rangle \leftrightarrow L_1 \cap L_2$ and $H_1 \cap H_2 \leftrightarrow L_1 L_2$. (Here $\langle H_1, H_2 \rangle$ is the smallest subgroup of $G$ containing both $H_1$ and $H_2$ and $L_1 L_2$ is the smallest field containing both $L_1$ and $L_2$). Moreover the inclusions (e.g. $L_1 \cap L_2 \subset L_1 \subset L_1 L_2$) are reversed under the correspondence.

Here we write arrows for the inclusion map. The correspondence reverses inclusion.

The correspondence between subgroups of $\mathrm{Gal}(K|k)$ and subfields of $K|k$ is complete as the subfields of $\mathbb{Q}(i, \sqrt[8]{2})$ and subgroups of $G = \langle \sigma, \tau : \sigma^8 = \tau^2 = 1, \sigma \tau = \tau \sigma^3 \rangle$. This example was cribbed from Abstract Algebra, second edition by Dummit and Foote.


From Measures to Metrics on Pro-finite Completions

The complete 3-nary tree represents the family tree where each individual in a generation spawns (asexually) exactly three progeny in the subsequent generation. The image to the left represents 7 generations beginning from a single ancestor (the root) at the center of the image.

If we imagine the generations continuing ad infinitum, then we arrive at an object called the pro-finite completion of the tree. Loosely speaking this is the topological space which consists of all infinite paths from the root down through the (infinitely many) generations.

The pro-finite completion of the complete 3-nary tree can be put in correspondence with sequences of the form $(m_n)$ where each $m_n \in \{0,1,2\}$ as follows: Each descendent of an individual is labelled either 0, 1 or 2 (this can be done consistently by ordering, say, counterclockwise in the embedding above, but it doesn’t really matter so long as the labels are fixed for all time). A sequence starting, say, $(1,0,2,1,…)$ represents a child of (reading right to left) the first child of the second child of the zeroth child of the first child of the root. Admittedly ‘zeroth child’ sounds awkward, but we think of these as labels and not ordinals.

Visually, we may think of the pro-finite completion to be the boundary of the infinite graph, and the corresponding sequence $(m_n)$ as an address containing the information necessary to describe how to traverse the tree to get to that point on the boundary.

There are other embeddings of the complete 3-nary tree, including the ‘balloon embedding’ on the right. In this embedding the pro-finite completion is visualized as its (fractal) boundary. This embedding gives another construction of the fractal known as Serpienski’s Triangle.

In this embedding you may think of an ‘address’ of a point on the boundary as given by a sequence of ‘Left’, ‘Right’ and ‘Forward’ directions were you to drive to that point from the root along the edges of the graph. A bijection between $\{0,1,2\}$ and {Left, Right, Forward} will produce the sequence $(m_n)$.

Of course there’s nothing special about the 3-nary tree. We could start with any number of descendants per individual per generation. Indeed, we could let the number of descendants vary either between generations, or within a generation. We will see some examples of this soon.

Balloon embeddings of the complete 2-nary (binary), 4-nary and 5-nary trees. In each case the pro-finite completion is the fractal boundary of these graphs, and not the depicted edges and vertices.

Random Trees

There are lots of ways to make random graphs and trees, but here we will concentrate on a sort of random tree that will arise in the study of prime splitting in towers of number fields. We will suppose we start with a single ancestor (the root), and that each individual in the nth generation has an independent, identically distributed, bounded number of children. Note that the bound on the number of children may grow with generations, but for each generation there is some upper bound on the number of children an individual may have.

A simple example of a random tree where each individual has an equal chance of having 1, 2 or 3 offspring. The images are different embeddings of the same random tree.

Suppose the largest number of children an individual in the $n$the generation may have is $b_n$ (for instance, the random tree above has $b_n=3$ for all $n$). We call the sequence $(b_n)$ the sequence of generation bounds, and we call the tree where each individual in the $n$th generation has exactly $b_n$ children the complete $(b_n)$-nary tree. Every random tree with generation bounds $(b_n)$ can be embedded as a subtree in the complete $(b_n)$-nary tree.

As in the non-random case, the pro-finite completion of a random tree is the address, given as the directions necessary to traverse the tree from the root to a point on the `boundary’. Another way of representing the information given in the address is provided by the list of vertices $(v_n)$ one passes through on the voyage from the root. Here one assumes that the vertices are uniquely labelled. If $(v_n)$ is such a list of vertices we will write $v_m | v_n$ for all $m > n$. Loosely speaking, a vertex $v$ divides the vertex $w$ if $v$ is a vertex further down the tree from $w$. Put even more simply, $v | w$ if $v$ is descended from $w$. We will denote the root of the tree by $v_0$ and note that $v | v_0$ for all vertices $v$.

Let $B$ be the pro-finite completion of our (possibly random) tree as represented by sequences of vertices $(v_n)$, one per generation, with $v_m | v_n$ for all pairs $m > n$. For any vertex $w$ we define $$B(w) = \{ (v_n) \in B : w = v_m \mbox{ for some } m \}.$$ Loosely speaking $B(w)$ is the set of points in the pro-finite completion (boundary of the tree) that are downstream from vertex $w$.

$B(w)$ represents the part of the pro-finite completion that lies in the blue disk (left) or arc (right). Note that these are different representations of the same $B(w)$ on different embeddings of the same random tree.

Note that if $u$ and $w$ are different vertices, then either $B(u)$ and $B(w)$ are disjoint, or one is a subset of the other. It is worth supplying your own proof of this, or at least understanding why it is true from a picture.

$\sigma$-algebras and measures on $B$

We eventually want to talk about measures (and metrics) on the pro-finite completion of a random tree, but first we need a suitable $\sigma$-algebra. As usual, we actually define a nice collection of sets that we want to be in our $\sigma$-algebra and consider the smallest $\sigma$-algebra that does the trick. Dynkin’s $\pi$-$\lambda$ Theorem seems particularly salient here, and we define $\mathcal P$ to be the $\pi$-system given by all $B(w)$ for all vertices of our tree. That is $$\mathcal P = \{ B(v) : v \mbox{ is a vertex} \} \cup \emptyset.$$ We have to throw in the empty set, because a $\pi$-system is a collection of sets closed under intersection, and by our previous remarks, it is possible (common, in fact) for elements of $\mathcal P$ to be disjoint. We set $\mathcal D$ to be the $\sigma$-algebra on $B$ generated by $\mathcal P$. And we take $(B, \mathcal D)$ to be the measurable space in which all calculations occur.

Notice that, since the intersection of any two elements of $\mathcal P$ is again an element of $\mathcal P$ we see that elements of $\mathcal D$ are simply (possibly countable) disjoint unions of elements of $\mathcal P$. That is, for each set $A \in \mathcal D$ there is a (finite or) countable collection of vertices $V$, and a (finite or) countable $X \subset B$ such that $$A = \bigsqcup_{v \in V} B(v) \sqcup \bigsqcup_{x \in X} \{x \}.$$ The disjointness of this union implies we may do this in such a way that for any $u, v \in V$, $u \not | \;\; v$, and for any $x \in X$ and $v \in V$, $x \not \in B(v)$. We call $V$ a reduced set of vertices for $A$.

The $\pi$-$\lambda$ Theorem implies that a measure $\mu$ on $(B, \mathcal D)$ is determined completely by its values on $\mathcal P$. Note that if $w_1, \ldots, w_d$ are the child-vertices of vertex $w$, then $\mu(w) = \mu(B(w_1)) + \cdots + \mu(B(w_d))$ (and conversely, any collection of $\{m_v \in [0, \infty] : v \mbox{ a vertex}\}$ satisfying all consistency conditions of the form $m_w = m_{w_1} + \cdots + m_{w_d}$ will determine a measure on $(B, \mathcal D)$). There may be special measures on $(B, \mathcal D)$ depending on the construction of your tree, but for now we maintain complete generality, and see how various aspects of the measure interact to potentially give a metric on $B$.

Recall that an atom of the measure $\mu$ is a set $A \in \mathcal D$ such that $\mu(A) > 0$ and if $C \in \mathcal D$ is a proper subset of $A$ then $\mu(C) = 0$. By our construction of elements of $\mathcal D$ we see that if $A$ is an atom of $\mu$ then either $A = \{x \}$ for some $x \in B$, or $A = B(v)$ for some vertex $v.$ In fact, we will see that this latter situation is impossible. To see why, suppose the vertices $v_1, \ldots, v_d$ are the immediate descendants of $v$. Then, $$\mu(B(v)) = \mu(B(v_1)) + \cdots + \mu(B(v_d)).$$ If $d > 1$, It is not possible for $\mu(B(v)) > 0$ and $\mu(B(v_n)) = 0$ for all $n=1,\ldots, d$. Hence, if $v$ has more than one immediate descendent, $B(v)$ cannot be an atom of $\mu$. If, on the other hand $d = 1$ then $B(v) =B(v_1)$ and we can repeat our argument to show that either $B(v_1)$ is not an atom, or it only has one immediate descendant. It follows that if $B(v)$ is an atom then each descendent of $v$ has only one immediate descendent. That is, $B(v)$ contains only one $x \in B$, and hence if $B(v)$ is an atom, then in fact $B(v) = \{x \}$.

$B(v)$ can be a singleton only when all descendants of $v$ have only one immediate descendent. The only $B(v)$ that can be atoms of $\mu$ are singletons.

If $\mu$ has no atoms, then it is said to be diffuse. If $\mu(B(v)) > 0$ for all $B(v)$ that are not singletons, then we say $\mu$ is a full measure.

A pseudo-ultrametric formed from $\mu$

A metric on $B$ is a function $\delta: B \times B \rightarrow [0, \infty]$ such that for all $x,y,z \in B$,

  1. $\delta(x,x) = 0$
  2. $\delta(x, y) = 0$ implies $x = y$
  3. $\delta(x,y) = \delta(y,x)$
  4. $\delta(x,z) \leq \delta(x,y) + \delta(y,z)$

Note that we allow the possibility that $\delta$ is infinite. This is a slight generalization of the usual notion of a metric, but it disturbs very little. If we enforce the stronger requirement $$4′. \quad \delta(x,z) \leq \max\{\delta(x,y), \delta(y,z)\}$$ then we say $\delta$ is an ultrametric. If we instead, lose requirement (2), then we say $\delta$ is a pseudometric. Thus a pseudo-ultrametric $\delta$ satisfies for all $x,y,z \in B$,

  • $\delta(x,x) = 0$
  • $\delta(x,y) = \delta(y,x)$
  • $\delta(x,z) \leq \max\{ \delta(x,y), \delta(y,z) \}$

The third condition is called the ultrametric inequality or the strong triangle inequality.


Given a measure $\mu$ on $(B, \mathcal D)$, define $\delta : B \times B \rightarrow [0,\infty]$ by $\delta(x,x) = 0$ for all $x \in B$, and for $x \neq y$ , $$\delta(x,y) = \inf\{ \mu(A) : A \in \mathcal P \mbox{ with } x, y \in A\}.$$ Then $\delta$ is a pseudo-ultrametric. Moreover, if $\mu$ is a full measure, then $\delta$ is an ultrametric.

Another way of defining $\delta$ is to first define the least common ancestor vertex of any two $x, y \in B$ by $a(x,y)$ in which case $\delta(x,y) = \mu(B(a(x,y))$. This definition only makes sense when $x \neq y$.


To show that $\delta$ is a pseudo-ultrametric, the only nontrivial condition to check is the ultrametric inequality. That is, for any $x, y, z \in B$, $\delta(x,z) \leq \max\{ \delta(x,y), \delta(y,z) \}$.

There are two cases. The first (slide up) we have $y \not \in B(a(x,z))$. In this case $B(a(x,z)) \subset B(a(y,z))$ and hence $\delta(x,z) \leq \delta(x,y)$. In the second case (slide down) $y \in B(a(x,z))$ and without loss of generality $B(a(x,z)) = B(a(y,z))$.

In the first case we get $B(a(x,z)) \subset B(a(y,z)) = B(a(x,y))$ and hence $\delta(x,z) \leq \max\{ \delta(x,y), \delta(y,z)\}$. (Note that equality is still possible in this case, because it is possible that $\mu(B(a(x,z)) = \mu(B(a(y,z))$ even when $B(a(x,z))$ is a proper subset of $B(a(y,z))$.

The second of these cases is a bit more delicate, but we see that, changing the labels if necessary, $a(x,z) = a(y,z)$. It follows that $\delta(x,z) = \delta(y,z)$ and $\delta(x,y) \leq \delta(y,z)$ which together yield $\delta(x,z) \leq \max\{\delta(x,y), \delta(y,z)\}$ as desired.

Notice that if $\mu$ is full, then $\delta(x,y) = \mu(B(a(x,y)) > 0$ and hence $\delta$ is in fact an ultrametric. $\square$

We say that $x \in B$ is isolated with respect to $\delta$ if there exists $\epsilon > 0$ such that $\delta(x,y) > \epsilon$ for all $y \neq x$. As the next result shows, isolated points come from atoms of $\mu$.


Suppose $x \in B$ is such that $\{x\}$ is an atom of $\mu$. Then $x$ is isolated with respect to $\delta$. In particular, if $y \neq x$ then $\delta(x,y) \geq \mu\{x\}$.


By definition $x \in B(a(x,y))$. It follows that $\mu\{x\} \leq \mu(B(a(x,y)) = \delta(x,y)$. $\square$