1: \documentclass{article}
2: %[a4paper]
3:
4: \usepackage{amsmath,amssymb,mylogic,isabelle,isabellesym}
5:
6: % \isabellestyle{sl}
7:
8: \title{A formally verified proof of the \\prime number
9: theorem\footnote{To appear in \emph{ACM Transcations on
10: Computational Logic.} Work by the first author has been
11: supported by NSF grant DMS-0401042.}}
12:
13: \author{Jeremy Avigad, Kevin Donnelly, David Gray, and Paul Raff}
14:
15: \date{January 19, 2006}
16:
17: \begin{document}
18:
19: \maketitle
20:
21: \begin{abstract}
22: The prime number theorem, established by Hadamard and de la Vall\'ee
23: Poussin independently in 1896, asserts that the density of primes in
24: the positive integers is asymptotic to $1 / \ln x$. Whereas their
25: proofs made serious use of the methods of complex analysis,
26: elementary proofs were provided by Selberg and Erd\"os in 1948. We
27: describe a formally verified version of Selberg's proof, obtained
28: using the Isabelle proof assistant.
29: \end{abstract}
30:
31: \section{Introduction}
32:
33: For each positive integer $x$, let $\pi(x)$ denote the number of
34: primes less than or equal to $x$. The prime number theorem asserts
35: that the density of primes $\pi(x)/x$ in the positive integers is
36: asymptotic to $1 / \ln x$, i.e.~that $\lim_{x \to \infty} \pi(x) \ln x
37: / x = 1$. This was conjectured by Gauss and Legendre around the turn
38: of the nineteenth century, and posed a challenge to the mathematical
39: community for almost a hundred years, until Hadamard and de la
40: Vall\'ee Poussin proved it independently in 1896.
41:
42: On September 6, 2004, the first author of this article verified the
43: following statement, using the Isabelle proof assistant:
44: \begin{center}
45: \begin{isabellebody}
46: \isamarkupfalse%
47: {\isacharparenleft}{\isasymlambda}x{\isachardot}\ pi\ x\ {\isacharasterisk}\ ln\ {\isacharparenleft}real\ x{\isacharparenright}\ {\isacharslash}\ {\isacharparenleft}real\ x{\isacharparenright}{\isacharparenright}\ {\isacharminus}{\isacharminus}{\isacharminus}{\isacharminus}{\isachargreater}\ {\isadigit{1}}
48: \end{isabellebody}
49: \end{center}
50: %% \begin{center}
51: %% \begin{isabellebody}
52: %% \isamarkupfalse%
53: %% \isacommand{theorem}\ PrimeNumberTheorem{\isacharcolon}\ {\isachardoublequote}{\isacharparenleft}{\isasymlambda}x{\isachardot}\ pi\ x\ {\isacharasterisk}\ ln\ {\isacharparenleft}real\ x{\isacharparenright}\ {\isacharslash}\ {\isacharparenleft}real\ x{\isacharparenright}{\isacharparenright}\ {\isacharminus}{\isacharminus}{\isacharminus}{\isacharminus}{\isachargreater}\ {\isadigit{1}}{\isachardoublequote}
54: %% \end{isabellebody}
55: %% \end{center}
56: % \begin{quote}
57: % \tt{(\%x.\ pi x * ln (real x) / (real x)) ----> 1}
58: % \end{quote}
59: The system thereby confirmed that the prime number theorem is a
60: consequence of the axioms of higher-order logic, together with an axiom
61: asserting the existence of an infinite set.
62:
63: One reason the formalization is interesting is simply that it is a
64: landmark, showing that today's proof assistants have achieved a level
65: of usability that makes it possible to formalize substantial theorems
66: of mathematics. Similar achievements in the past year include George
67: Gonthier's verification of the four color theorem using Coq, and
68: Thomas Hales's verification of the Jordan curve theorem using
69: HOL-light (see the introduction to Wiedijk \cite{wiedijk:05}). As
70: contemporary mathematical proofs become increasingly complex, the need
71: for formal verification becomes pressing. Formal verification can also
72: help guarantee correctness when, as is becoming increasingly common,
73: proofs rely on computations that are too long to check by hand.
74: Hales's ambitious Flyspeck project \cite{hales:unp}, which aims for a
75: fully verified form of his proof of the Kepler conjecture, is a
76: response to both of these concerns. Here, we will provide some
77: information as to the time and effort that went into our
78: formalization, which should help gauge the feasibility of such
79: verification efforts.
80:
81: More interesting, of course, are the lessons that can be learned.
82: This, however, puts us on less certain terrain. Our efforts certainly
83: provide some indications as to how to improve libraries and systems
84: for verifying mathematics, but the data still need to be processed
85: and better understood. Here, therefore, we simply offer some initial
86: thoughts and observations.
87:
88: The outline of this paper is as follows. In
89: Section~\ref{background:section}, we provide some background on the
90: prime number theorem and the Isabelle proof assistant. In
91: Section~\ref{overview:section}, we provide an overview of Selberg's
92: proof, our formalization, and the effort involved. In
93: Section~\ref{reflection:section}, we discuss some interesting aspects
94: of the formalization: the use of asymptotic reasoning; calculations
95: with real numbers; casts between natural numbers, integers, and real
96: numbers; combinatorial reasoning in number theory; and the use of
97: elementary methods. Finally, in Section~\ref{conclusions:section} we
98: offer some brief conclusions.
99:
100: Our formalization of the prime number theorem was a collaborative
101: effort on the part of Avigad, Donnelly, Gray, and Raff, building, of
102: course, on the efforts of the entire Isabelle development team. This
103: article was, however, written by Avigad, so opinions and speculation
104: contained herein should be attributed to him.
105:
106: %===================================================================
107:
108: \section{Background}
109: \label{background:section}
110:
111: \subsection{The prime number theorem}
112:
113: The statement of the prime number theorem was conjectured by both
114: Gauss and Legendre, on the basis of computation, around the turn of
115: the nineteenth century. In a pair of papers published in 1851 and
116: 1852, Chebyshev made significant advances towards proving it. Note
117: that we can write
118: \[
119: \pi(x) = \sum_{p \leq x} 1,
120: \]
121: where $p$ ranges over the prime numbers. Contrary to our notation
122: above, $x$ is usually treated as a real variable, making $\pi$ a step
123: function on the reals. Chebyshev defined, in addition, the functions
124: \[
125: \theta(x) = \sum_{p \leq x} \ln p
126: \]
127: and
128: \[
129: \psi(x) = \sum_{p^a \leq x} \ln p = \sum_{n \leq x} \Lambda(n),
130: \]
131: where
132: \[
133: \Lambda(n) =
134: \left\{ \begin{array}{ll}
135: \ln p & \mbox{if $n = p^a$, for some $a \geq 1$} \\
136: 0 & \mbox{otherwise.}
137: \end{array}\right.
138: \]
139: The functions $\theta$ and $\psi$ are more sensitive to the presence
140: of primes than $\pi$, and have nicer analytic properties. Chebyshev
141: showed that the prime number theorem is equivalent to the assertion
142: $\lim_{x \to \infty} \theta(x) / x = 1$, as well as to the assertion
143: $\lim_{x \to \infty} \psi(x) / x = 1$. He also provided bounds
144: \[
145: B < \pi(x) \ln x / x < 6 B / 5
146: \]
147: for sufficiently large $x$, where
148: \[
149: B = \ln 2 / 2 + \ln 3 / 3 + \ln 5 / 5 - \ln 30 / 30 > 0.92
150: \]
151: and $6 B / 5 < 1.11$. So, as $x$ approaches infinity, $\pi(x) \ln
152: x / x$, at worst, oscillates between these two values.
153:
154: In a landmark work of 1859, Riemann introduced the complex-valued
155: function $\zeta$ into the study of number theory. It was not until
156: 1894, however, that von Mangoldt provided an expression for $\psi$
157: that reduced the prime number theorem, essentially, to showing that
158: $\zeta$ has no roots with real part equal to 1. This last step was
159: achieved by Hadamard and de la Vall\'ee Poussin, independently, in
160: 1896. The resulting proofs make strong use of the theory of complex
161: functions. In 1921, Hardy expressed strong doubts as to whether a
162: proof of the theorem was possible which did not depend, fundamentally,
163: on these ideas. In 1948, however, Selberg and Erd\"os found elementary
164: proofs based on a ``symmetry formula'' due to Selberg. (The nature of
165: the interactions between Selberg and Erd\"os at the time and the
166: influence of ideas is a subtle one, and was the source of tensions
167: between the two for years to come.) Since the libraries we had to work
168: with had only a minimal theory of the complex numbers and a limited
169: real analysis library, we chose to formalize the Selberg proof.
170:
171: There are a number of good introductions to analytic number theory
172: (for example, \cite{apostol:76,jameson:04}). Edwards's \emph{Riemann's
173: zeta function} \cite{edwards:01} is an excellent source of both
174: historical and mathematical information. A number of textbooks present
175: Selberg's proof, including those by Nathanson \cite{nathanson:00},
176: Shapiro \cite{shapiro:83}, and Hardy and Wright
177: \cite{hardy:wright:79}. We followed Shapiro's excellent presentation
178: quite closely, though we made good use of Nathanson's book as well.
179:
180: We also had help from another source. Cornaros and Dimitracopoulos
181: \cite{cornaros:dimitracopoulos:94} have shown that the prime number
182: theorem is provable in a weak fragment of arithmetic, by showing how
183: to formalize Selberg's proof (based on Shapiro's presentation) in that
184: fragment.\footnote{For issues relating to the formalization of
185: mathematics, and number theory in particular, in weak theories of
186: arithmetic, see Avigad \cite{avigad:03b}.} Their concerns were
187: different from ours: by relying on a formalization of higher-order
188: logic, we were allowing ourselves a logically stronger theory; on the
189: other hand, Cornaros and Dimitracopoulos were concerned solely with
190: axiomatic provability and not ease of formalization. Their work was,
191: however, quite helpful in stripping the proof down to its bare
192: essentials. Also, since our libraries did not have a good theory of
193: integration, we had to take some care to avoid the mild uses of
194: analysis in the textbook presentations. Cornaros and Dimitracopoulos's
195: work was again often helpful in that respect.
196:
197: \subsection{Isabelle}
198:
199: Isabelle \cite{isabelle} is a generic proof assistant developed under
200: the direction of Larry Paulson at Cambridge University and Tobias
201: Nipkow at TU Munich. The HOL instantiation \cite{nipkow:et:al:02}
202: provides a formal framework that is a conservative extension of
203: Church's simple type theory with an infinite type (from which the
204: natural numbers are constructed), extensionality, and the axiom of
205: choice. Specifically, HOL extends ordinary type theory with set types,
206: and a schema for polymorphic axiomatic type classes designed by
207: Nipkow and implemented by Marcus Wenzel \cite{wenzel:97}. It also
208: includes a definite description operator (``THE''), and an indefinite
209: description operator (``SOME'').\footnote{The extension by set types
210: is mild, since they are easily interpretable in terms of predicate
211: types $\sigma \to \fn{bool}$. Similarly, the definite description
212: operator can be eliminated, at least in principle, using Russell's
213: well-known interpretation. It is the indefinite description
214: operator, essentially a version of Hilbert's epsilon operator, that
215: gives rise to the axiom of choice. Though we occasionally used the
216: indefinite description operator for convenience, these uses could
217: easily be replaced by the definition description operator, and it is
218: likely that uses of the axiom of choice can be dispensed with in the
219: libraries as well. In any event, it is a folklore result that
220: G\"odel's methods transfer to higher-order logic to show that the
221: axiom of choice is a conservative extension for a fragment the
222: includes the prime number theorem.}
223:
224: Isabelle offers good automated support, including a term simplifier,
225: an automated reasoner (which combines tableau search with rewriting),
226: and decision procedures for linear and Presburger arithmetic. It is an
227: LCF-style theorem prover, which is to say, correctness is guaranteed
228: by the use of a small number of constructors, in an underlying typed
229: programming language, to build proofs. Using the Proof General
230: interface \cite{proof:general}, one can construct proofs interactively
231: by repeatedly applying ``tactics'' that reduce a current subgoal to
232: simpler ones. But Isabelle also allows one to take advantage of a
233: higher-level proof language, called Isar, implemented by Wenzel
234: \cite{wenzel:02}. These two styles of interaction can, furthermore, be
235: combined within a proof. We found Isar to be extremely helpful in
236: structuring complex proofs, whereas we typically resorted to
237: tactic-application for filling in low-level inferences. Occasionally,
238: we also made mild use of Isabelle's support for locales
239: \cite{ballarin:05}. For more information on Isabelle, one should
240: consult the tutorial \cite{nipkow:et:al:02} and other online
241: documentation \cite{isabelle}.
242:
243: Our formalization made use of the basic HOL library, as well as those
244: parts of the HOL-Complex library, developed primarily by Jacques
245: Fleuriot, that deal with the real numbers. Some of our earlier
246: definitions, lemmas, and theorems made their way into the 2004 release
247: of Isabelle, in which the formalization described here took
248: place. Some additional theorems in our basic libraries will be part of
249: the 2005 release.
250:
251: %===================================================================
252:
253: \section{Overview}
254: \label{overview:section}
255:
256: \subsection{The Selberg proof}
257: \label{selberg:subsection}
258:
259: The prime number theorem describes the asymptotic behavior of a
260: function from the natural numbers to the reals. Analytic number theory
261: works by extending the domain of such functions to the real numbers,
262: and then providing a toolbox for reasoning about such functions. One
263: is typically concerned with rough characterizations of a function's
264: rate of growth; thus $f = O(g)$ expresses the fact that
265: for some constant $C$, $| f(x) | \leq C | g(x) |$ for every $x$.
266: (Sometimes, when writing $f = O(g)$, one really means that the
267: inequality holds except for some initial values of $x$, where $g$ is
268: $0$ or one of the functions is undefined; or that the inequality holds
269: when $x$ is large enough.)
270:
271: For example, all of the following identities can be obtained using
272: elementary calculus:
273: \begin{equation*}
274: \begin{split}
275: \ln (1 + 1 / n) & = 1 / n + O(1/n^2) \\
276: \sum_{n \leq x} 1 / n & = \ln x + O(1) \\
277: \sum_{n \leq x} \ln n & = x \ln x - x + O(\ln x)\\
278: \sum_{n \leq x} \ln n / n & = \ln^2 x / 2 + O(1)
279: \end{split}
280: \end{equation*}
281: In all of these, $n$ ranges over positive integers. The last three
282: inequalities hold whether one takes $x$ to be an integer or a real
283: number greater than or equal to $1$. The second identity reflects the
284: fact that the integral of $1 / x$ is $\ln x$, and the third reflects
285: the fact that the integral of $\ln x$ is $x \ln x - x$. A list of
286: identities like these form one part of the requisite background to the
287: Selberg proof.
288:
289: Some of Chebyshev's results form another part. Rate-of-growth
290: comparisons between $\theta$, $\psi$, and $\pi$ sufficient to show the
291: equivalence of the various statements of the prime number theorem can
292: be obtained by fairly direct calculations. Obtaining any of the upper
293: bounds equivalent to $\psi(x) = O(x)$ requires more work. A nice way
294: of doing this, using binomial coefficients, can be found in
295: Nathanson \cite{nathanson:00}.
296:
297: Number theory depends crucially on having different ways of counting
298: things, and rudimentary combinatorial methods form a third
299: prerequisite to the Selberg proof. For example, consider the set of
300: (positive) divisors $d$ of a positive natural number $n$. Since the
301: function $d \mapsto n / d$ is a permutation of that set, we have the
302: following identity:
303: \[
304: \sum_{d | n} f(d) = \sum_{d | n} f(n / d).
305: \]
306: For a more complicated example, suppose $n$ is a positive integer, and
307: consider the set of pairs $d, d'$ of positive integers such that $d d'
308: \leq n$. There are two ways to enumerate these pairs: for each
309: value of $d$ between $1$ and $n$, we can enumerate all the values $d'$
310: such that $d' \leq n / d$; or for each product $c$ less than $n$, we
311: can enumerate all pairs $d, c / d$ whose product is $c$. Thus we have
312: \begin{equation}
313: \begin{split}
314: \sum_{d \leq n} \sum_{d' \leq n / d} f(d, d') & = \sum_{d d' \leq n}
315: f(d, d') \\
316: & = \sum_{c \leq n} \sum_{d | c} f(d, c/d).
317: \end{split}
318: \end{equation}
319: A similar argument yields
320: \begin{equation}
321: \label{counting:eq}
322: \begin{split}
323: \sum_{d | n} \sum_{d' | (n / d)} f(d, d') & = \sum_{d d' | n}
324: f(d, d') \\
325: & = \sum_{c | n} \sum_{d | c} f(d, c/d).
326: \end{split}
327: \end{equation}
328: Yet another important combinatorial identity is given by the partial
329: summation formula, which, in one formulation, is as follows: if $a
330: \leq b$, $F(n)
331: = \sum_{i = 1}^n f(i)$, and $G$ is any function, then
332: \begin{multline*}
333: \sum_{n = a}^b f(n + 1) G(n + 1) = F(b + 1) G(b + 1) - F(a) G(a + 1) -
334: \\
335: \sum_{n = a}^{b-1} F(n + 1) (G(n + 2) - G(n + 1)).
336: \end{multline*}
337: This can be viewed as a discrete analogue of integration by parts, and
338: can be verified by induction.
339:
340: An important use of (\ref{counting:eq}) occurs in the proof of the
341: M\"obius inversion formula, which we now describe. A positive natural
342: number $n$ is said to be \emph{square free} if no prime in its
343: factorization occurs with multiplicity greater than 1; in other words,
344: $n = p_1 p_2 \cdots p_s$ where the $p_i$'s are distinct primes (and
345: $s$ may be $0$). Euler's function $\mu$ is defined by
346: \[
347: \mu(n) = \left\{
348: \begin{array}{ll}
349: (-1)^s & \mbox{if $n$ is squarefree and $s$ is as above} \\
350: 0 & \mbox{otherwise.} \\
351: \end{array}
352: \right.
353: \]
354: A remarkably useful fact regarding $\mu$ is that for $n > 0$,
355: \begin{eqnarray}
356: \label{mu:eq}
357: \sum_{d | n} \mu(d) = \left\{
358: \begin{array}{ll}
359: 1 & \mbox{if $n = 1$} \\
360: 0 & \mbox{otherwise.}
361: \end{array}
362: \right.
363: \end{eqnarray}
364: To see this, define the \emph{radical} of a number $n$, denoted
365: $\fn{rad}(n)$, to be the greatest squarefree number dividing $n$. It
366: is not hard to see that if $n$ has prime factorization $p_1^{j_1}
367: p_2^{j_2} \cdots p_s^{j_s}$, then $\fn{rad}(n)$ is given by $p_1 p_2
368: \cdots p_s$. Then $\sum_{d | n} \mu(d) = \sum_{d | \fn{rad}(n)}
369: \mu(d)$, since divisors of $n$ that are not divisors of $\fn{rad}(n)$
370: are not squarefree and hence contribute $0$ to the sum. If $n = 1$,
371: equation (\ref{mu:eq}) is clear. Otherwise, write $\fn{rad}(n) = p_1
372: p_2 \cdots p_s$, write
373: \[
374: \sum_{d|\fn{rad}(n)} \mu(d) = \sum_{d | \fn{rad}(n), p_1 | d} \mu(d) +
375: \sum_{d | \fn{rad}(n), p_1 \nmid d} \mu(d),
376: \]
377: and note that each term in the first sum is canceled by a
378: corresponding one in the second.
379:
380: Now, suppose $g$ is any function from $\N$ to $\mathbb{R}$, and define $f$
381: by $f(n) = \sum_{d |n} g(d)$. The M\"obius inversion formula provides
382: a way of ``inverting'' the definition to obtain an expression for $g$
383: in terms of $f$. Using (\ref{counting:eq}) for the third equality
384: below and (\ref{mu:eq}) for the last, we have, somewhat miraculously,
385: \begin{eqnarray*}
386: \sum_{d | n} \mu(d) f(n / d) & = &
387: \sum_{d | n} \mu(d) \sum_{d' | (n / d)} g((n / d) / d') \\
388: & = & \sum_{d | n} \sum_{d' | (n / d)} \mu(d) g((n / d) / d') \\
389: & = & \sum_{c | n} \sum_{d | c} \mu(d) g(n / c) \\
390: & = & \sum_{c | n} g(n / c) \sum_{d | c} \mu(d) \\
391: & = & g(n),
392: \end{eqnarray*}
393: since the inner sum on the second-to-last line is $0$ except when $c$
394: is equal to $1$.
395:
396: All the pieces just described come together to yield additional
397: identities involving sums, $\ln$, and $\mu$, as well as Mertens's
398: theorem:
399: \[
400: \sum_{n \leq x} \Lambda(n) / n = \ln x + O(1).
401: \]
402: These, in turn, are used to derive Selberg's elegant ``symmetry
403: formula,'' which is the central component in the proof. One
404: formulation of the symmetry formula is as follows:
405: \[
406: \sum_{n \leq x} \Lambda (n) \ln n + \sum_{n \leq x} \sum_{d | n}
407: \Lambda(d) \Lambda(n / d) = 2 x \ln x + O(x).
408: \]
409: There are, however, many variants of this identity, involving
410: $\Lambda$, $\psi$, and $\theta$. These crop up in profusion because
411: one can always unpack definitions of the various functions, apply the
412: types of combinatorial manipulations described above, and use
413: identities and approximations to simplify expressions.
414:
415: What makes the Selberg symmetry formula so powerful is that there are
416: two terms in the sum on the left, each sensitive to the presence of
417: primes in different ways. The formula above implies there have to be
418: some primes --- to make left-hand side nonzero --- but there can't be
419: too many. Selberg's proof involves cleverly balancing the two terms
420: off each other, to show that in the long run, the density of the
421: primes has the appropriate asymptotic behavior.
422:
423: Specifically, let $R(x) = \psi(x) - x$ denote the ``error term,'' and
424: note that by Chebyshev's equivalences the prime number theorem amounts
425: to the assertion $\lim_{x \to \infty} R(x) / x = 0$. With some
426: delicate calculation, one can use the symmetry formula to obtain a
427: bound on $|R(x)|$:
428: \begin{equation}
429: \label{r:eq}
430: |R(x)| \ln^2 x \leq 2 \sum_{n \leq x} | R(x / n)| \ln n + O(x \ln x).
431: \end{equation}
432: Now, suppose we have a bound $| R(x) | \leq a x$ for sufficiently large
433: $x$. Substituting this into the right side of (\ref{r:eq}) and using
434: an approximation for $\sum_{n \leq x} \ln n / n$ we get
435: \[
436: |R(x)| \leq a x + O(x / \ln x),
437: \]
438: which is not an improvement on the bound $|R(x)| \leq a x$ with which
439: we began. Selberg's method involves showing that in fact there are
440: always enough sufficiently large intervals on which one can obtain a
441: stronger bound on $R(x)$, so that for some positive constant $k$,
442: assuming we have a bound $|R(x)| \leq a x$ that valid for $x \geq
443: c_1$, we can obtain a $c_2$ and a better bound $|R(x)| \leq (a - k
444: a^3)$, valid for $x \geq c_2$. The constant $k$ depends on $a$, but
445: the same constant also works for any $a' < a$.
446:
447: By Chebyshev's theorem, we know that there is a constant $a_1$ such
448: that $|R(x)| \leq a_1 x$ for every $x$. Choosing $k$ appropriate for
449: $a_1$ and then setting $a_{n+1} = a_n - k a_n^3$, we have that for
450: every $n$, there is a $c$ large enough so that $|R(x)|/x \leq a_n$ for
451: every $x \geq c$. But it is not hard to verify that the sequence $a_1,
452: a_2, \ldots$ approaches $0$, which implies that $R(x) / x$ approaches
453: $0$ as $x$ approaches infinity, as required.
454:
455: \subsection{Our formalization}
456: \label{formalization:subsection}
457:
458: All told, our number theory session, including the proof of the prime
459: number theorem and supporting libraries, constitutes 673 pages of
460: proof scripts, or roughly 30,000 lines. This count includes about 65
461: pages of elementary number theory that we had at the outset, developed
462: by Larry Paulson and others; also about 50 pages devoted to a proof of
463: the law of quadratic reciprocity and properties of Euler's $\ph$
464: function, neither of which are used in the proof of the prime number
465: theorem. The page count does not include the basic HOL library, or
466: properties of the real numbers that we obtained from the HOL-Complex
467: library.
468:
469: The overview provided in the last section should provide a general
470: sense of the components that are needed for the formalization. To
471: start with, one needs good supporting libraries:
472: \begin{itemize}
473: \item a theory of the natural numbers and integers, including
474: properties of primes and divisibility, and the fundamental theorem
475: of arithmetic
476: \item a library for reasoning about finite sets, sums, and products
477: \item a library for the real numbers, including properties of $\ln$
478: \end{itemize}
479: The basic Isabelle libraries provided a good starting point, though we
480: had to augment these considerably as we went along. More specific
481: supporting libraries include:
482: \begin{itemize}
483: \item properties of the $\mu$ function, combinatorial identities, and
484: the M\"obius inversion formula
485: \item a library for asymptotic ``big O'' calculations
486: \item a number of basic identities involving sums and $\ln$
487: \item Chebyshev's theorems
488: \end{itemize}
489: Finally, the specific components of the Selberg proof are:
490: \begin{itemize}
491: \item the Selberg symmetry formula
492: \item the inequality involving $R(n)$
493: \item a long calculation to show $R(n)$ approaches $0$
494: \end{itemize}
495:
496: This general outline is clearly discernible in the list of theory
497: files, which can be viewed online \cite{avigad:isabelle}. Keep in
498: mind that the files described here have not been modified since the
499: original proof was completed, and many of the proofs were written
500: while various participants in the project were still learning how to
501: use Isabelle. Since then, some of the basic libraries have been
502: revised and incorporated into Isabelle, but Avigad intends to revise
503: the number theory libraries substantially before cleaning up the rest
504: of the proof.
505:
506: Once the basic libraries are in place, our formal proof follows
507: Shapiro's presentation quite closely, though for some parts we
508: followed Nathanson instead. A detailed description of
509: our proof would amount to little more than a step-by-step narrative
510: of (one of the various paths through) Selberg's proof, with page
511: correspondences in texts we followed. For example, one of our
512: formulations of the M\"obius inversion is as follows:
513:
514: \medskip
515:
516: \begin{isabellebody}
517: \isamarkupfalse%
518: \isacommand{lemma}\ mu{\isacharunderscore}inversion{\isacharunderscore}nat{\isadigit{1}}a{\isacharcolon}\ {\isachardoublequote}ALL\ n{\isachardot}\ {\isacharparenleft}{\isadigit{0}}\ {\isacharless}\ n\ {\isasymlongrightarrow}\isanewline
519: \ \ f\ n\ {\isacharequal}\ {\isacharparenleft}{\isasymSum}\ d\ {\isacharbar}\ d\ dvd\ n{\isachardot}\ g{\isacharparenleft}n\ div\ d{\isacharparenright}{\isacharparenright}{\isacharparenright}\ {\isasymLongrightarrow}\ {\isadigit{0}}\ {\isacharless}\ {\isacharparenleft}n{\isacharcolon}{\isacharcolon}nat{\isacharparenright}\ {\isasymLongrightarrow}\isanewline
520: \ \ g\ n\ {\isacharequal}\ {\isacharparenleft}{\isasymSum}\ d\ {\isacharbar}\ d\ dvd\ n{\isachardot}\ of{\isacharunderscore}int{\isacharparenleft}mu{\isacharparenleft}int{\isacharparenleft}d{\isacharparenright}{\isacharparenright}{\isacharparenright}\ {\isacharasterisk}\ f\ {\isacharparenleft}n\ div\ d{\isacharparenright}{\isacharparenright}{\isachardoublequote}
521: \end{isabellebody}
522:
523: \medskip
524:
525: \noindent This appears on page 64 of Shapiro's book, and on page 218
526: of Nathanson's book. We formalized a version of the fourth identity
527: listed in Section~\ref{formalization:subsection} as follows:
528:
529: \medskip
530:
531: \begin{isabellebody}
532: \isamarkupfalse%
533: \isacommand{lemma}\ identity{\isacharunderscore}four{\isacharunderscore}real{\isacharunderscore}b{\isacharcolon}\ {\isachardoublequote}{\isacharparenleft}{\isasymlambda}x{\isachardot}\ {\isasymSum}i{\isacharequal}{\isadigit{1}}{\isachardot}{\isachardot}natfloor{\isacharparenleft}abs\ x{\isacharparenright}{\isachardot}\isanewline
534: \ \ \ \ ln\ {\isacharparenleft}real\ i{\isacharparenright}\ {\isacharslash}\ {\isacharparenleft}real\ i{\isacharparenright}{\isacharparenright}\ {\isacharequal}o\ \isanewline
535: \ \ \ \ \ \ {\isacharparenleft}{\isasymlambda}x{\isachardot}\ ln{\isacharparenleft}abs\ x\ {\isacharplus}\ {\isadigit{1}}{\isacharparenright}{\isacharcircum}{\isadigit{2}}\ {\isacharslash}\ {\isadigit{2}}{\isacharparenright}\ {\isacharplus}o\ O{\isacharparenleft}{\isasymlambda}x{\isachardot}\ {\isadigit{1}}{\isacharparenright}{\isachardoublequote}
536: \end{isabellebody}
537:
538: \medskip
539:
540: \noindent In fact, stronger assertions can be found on page 93 of Shapiro's
541: book, and on page 209 of Nathanson's book. Here is one of our
542: formulations of the Selberg symmetry principle:
543:
544: \medskip
545:
546: \begin{isabellebody}
547: \isamarkupfalse%
548: \isacommand{lemma}\ Selberg{\isadigit{3}}{\isacharcolon}\ {\isachardoublequote}{\isacharparenleft}{\isasymlambda}x{\isachardot}\ {\isasymSum}n\ {\isacharequal}\ {\isadigit{1}}{\isachardot}{\isachardot}natfloor\ {\isacharparenleft}abs\ x{\isacharparenright}\ {\isacharplus}\ {\isadigit{1}}{\isachardot}\isanewline
549: \ \ \ \ Lambda\ n\ {\isacharasterisk}\ ln\ {\isacharparenleft}real\
550: n{\isacharparenright}{\isacharparenright}\ {\isacharplus}\ {\isacharparenleft}{\isasymlambda}x{\isachardot}\ {\isasymSum}n{\isacharequal}{\isadigit{1}}{\isachardot}{\isachardot}natfloor\ {\isacharparenleft}abs\ x{\isacharparenright}\ {\isacharplus}\ {\isadigit{1}}{\isachardot}\ \isanewline
551: \ \ \ \ \ {\isacharparenleft}{\isasymSum}\ u\ {\isacharbar}\ u\ dvd\ n{\isachardot}\ Lambda\ u\ {\isacharasterisk}\ Lambda\ {\isacharparenleft}n\ div\ u{\isacharparenright}{\isacharparenright}{\isacharparenright}\isanewline
552: \ \ \ \ \ \ {\isacharequal}o\ {\isacharparenleft}{\isasymlambda}x{\isachardot}\ {\isadigit{2}}\ {\isacharasterisk}\ {\isacharparenleft}abs\ x\ {\isacharplus}\ {\isadigit{1}}{\isacharparenright}\ {\isacharasterisk}\ ln\ {\isacharparenleft}abs\ x\ {\isacharplus}\ {\isadigit{1}}{\isacharparenright}{\isacharparenright}\ {\isacharplus}o\ O{\isacharparenleft}{\isasymlambda}x{\isachardot}\ abs\ x\ {\isacharplus}\ {\isadigit{1}}{\isacharparenright}{\isachardoublequote}
553: \end{isabellebody}
554:
555: \medskip
556:
557: \noindent This is given on page 419 of Shapiro's book, and on page 293 of
558: Nathanson's book. The error estimate given in the previous section,
559: taken from 431 of Shapiro's book, takes the following form:
560:
561: \medskip
562:
563: \begin{isabellebody}
564: \isamarkupfalse%
565: \isacommand{lemma}\ error{\isadigit{7}}{\isacharcolon}\ {\isachardoublequote}{\isacharparenleft}{\isasymlambda}x{\isachardot}\ abs\ {\isacharparenleft}R\ {\isacharparenleft}abs\ x\ {\isacharplus}\ {\isadigit{1}}{\isacharparenright}{\isacharparenright}\ {\isacharasterisk}\ ln\ {\isacharparenleft}abs\ x\ {\isacharplus}\ {\isadigit{1}}{\isacharparenright}\ {\isacharcircum}\ {\isadigit{2}}{\isacharparenright}\ {\isacharless}o\isanewline
566: \ \ {\isacharparenleft}{\isasymlambda}x{\isachardot}\ {\isadigit{2}}\ {\isacharasterisk}\ {\isacharparenleft}{\isasymSum}\ n\ {\isacharequal}\ {\isadigit{1}}{\isachardot}{\isachardot}natfloor\ {\isacharparenleft}abs\ x{\isacharparenright}\ {\isacharplus}\ {\isadigit{1}}{\isachardot}\isanewline
567: \ \ \ \ \ \ abs\ {\isacharparenleft}R\ {\isacharparenleft}{\isacharparenleft}abs\ x\ {\isacharplus}\ {\isadigit{1}}{\isacharparenright}\ {\isacharslash}\ real\ n{\isacharparenright}{\isacharparenright}\ {\isacharasterisk}\ ln\ {\isacharparenleft}real\ n{\isacharparenright}{\isacharparenright}{\isacharparenright}\ {\isacharequal}o\ \isanewline
568: \ \ \ \ O{\isacharparenleft}{\isasymlambda}x{\isachardot}\ {\isacharparenleft}abs\ x\ {\isacharplus}\ {\isadigit{1}}{\isacharparenright}\ {\isacharasterisk}\ {\isacharparenleft}{\isadigit{1}}\ {\isacharplus}\ ln\ {\isacharparenleft}abs\ x\ {\isacharplus}\ {\isadigit{1}}{\isacharparenright}{\isacharparenright}{\isacharparenright}{\isachardoublequote}
569: \end{isabellebody}
570:
571: \medskip
572:
573: \noindent We will have more to say, below, about handling of
574: asymptotic notation, the type casts, and the various occurrences of
575: $\fn{abs}$ and $+1$ that make the formal presentation differ from
576: ordinary mathematical notation. But aside from calling attention to
577: differences like these, a more detailed outline would not be very
578: interesting.
579:
580: There are additional reasons that it does not pay to describe the
581: formal proofs in great detail. For one thing, they are not
582: particularly nice: our efforts were designed to get us to the prime
583: number theorem as quickly as possible, and so the proofs could use a
584: good deal of cleaning and polishing. Second, and more to the point, we
585: know that our formalization is not optimal. It hardly makes sense for
586: us to describe exactly how we went about proving the M\"obius
587: inversion formula, for example, until we are convinced that we have
588: done it right; that is, until we are convinced the we have made the
589: supporting libraries as generally useful as possible, and configured
590: the automated tools in such a way to make the formalization as smooth
591: as possible. We therefore intend to invest more time in improving the
592: various parts of the formalization and report on these when it is
593: clear what we have learned from the efforts.
594:
595: In the meanwhile, we will devote the rest of this report to conveying
596: two types of information. First, to help gauge the usability of the
597: current technology, we will try to provide a sense of the amount of
598: time required to seeing the project through to its completion. Second,
599: we will provide some initial reflections on the project, and on the
600: strengths and weaknesses of contemporary proof assistants. In
601: particular, we will discuss what we take to be some of the novel
602: aspects of the formalization, and indicate where we believe better
603: automated support would have been especially helpful.
604:
605: \subsection{The effort involved}
606:
607: As we have noted in the introduction, one of the most interesting
608: features of our formalization of the prime number theorem is simply
609: its existence, which shows that current technology makes it possible
610: to treat a proof of this complexity. The question naturally arises as
611: to how long the formalization took.
612:
613: This is a question that it hard to answer with any precision. Avigad
614: first decided to undertake the project in March of 2003, having
615: learned how to use Isabelle and proved Gauss's law of quadratic
616: reciprocity with Gray and Adam Kramer the preceding summer and fall.
617: But this was a side project for everyone involved, and time associated
618: it includes time spent learning to use Isabelle, time spent learning
619: the requisite number theory, and so on. Gray developed a substantial
620: part of the number theory library, including basic facts about primes
621: and multiplicity, the $\mu$ function, and the identity
622: (\ref{counting:eq}), working a few hours per week in the summer of
623: 2003, before his thesis work in ethics took over. Donnelly and Avigad
624: developed the library to support big O calculations
625: \cite{avigad:donnelly:04} while Donnelly worked half-time during the
626: summer of 2003, just after he completed his junior year at Carnegie
627: Mellon. During that summer, and working part time the following year,
628: Donnelly also derived some of the basic identities involving $\ln$.
629: Raff started working on the project in the 2003-2004 academic year,
630: but most of his contributions came working roughly half-time in the
631: summer of 2004, just after he obtained his undergraduate degree.
632: During that time, he proved Chebyshev's theorem to the effect that
633: $\psi(x) = O(x)$, and also did most of the work needed to prove the
634: equivalence of statements of the prime number theorem in terms of the
635: functions $\pi$, $\theta$, and $\psi$. Though Avigad's involvement was
636: more constant, he rarely put in more than a few hours per week before
637: the summer of 2004, and set the project aside for long stretches of
638: time. The bulk of his proof scripts were written during the summer of
639: 2004, when he worked roughly half-time on the project from the middle
640: of June to the end of August.
641:
642: Some specific benchmarks may be more informative. Proving most of the
643: inversion theorems we needed, starting from (\ref{counting:eq}) and
644: the relevant properties of $\mu$, took Avigad about a day. (For a
645: ``day'' read eight hours of dedicated formalization. Though he could
646: put in work-days like that for small stretches, in some of the
647: estimates below, the work was spread out over longer periods of time.)
648: Proving the first version of the Selberg symmetry formula using the
649: requisite identities took another day. Along the way, he was often
650: sidetracked by the need to prove elementary facts about things like
651: primes and divisibility, or the floor function on the real numbers.
652: This process stabilized, however, and towards the end he found that he
653: could formalize about a page of Shapiro's text per day. Thus, the
654: derivation of the error estimate described above, taken from pages
655: 428--431 in Shapiro's book, took about three-and-a-half days to
656: formalize; and the remainder of the proof, corresponding to 432--437
657: in Shapiro's book, took about five days.
658:
659: In many cases, the increase in length is dramatic: the
660: three-and-a-half pages of text associated with the proof of the error
661: estimate translate to about 1,600 lines, or 37 pages, of proof
662: scripts, and the five pages of text associated with the final part of
663: the proof translate to about 4,000 lines, or 89 pages, of proof
664: scripts. These ratios are abnormally high, however, for reasons
665: discussed in Section~\ref{calculation:subsection}. The five-line
666: derivation of the M\"obius inversion formula in
667: Section~\ref{selberg:subsection} translates to about 40 lines, and the
668: proof of the form of the Selberg symmetry formula discussed there,
669: carried out in about two-and-a-half pages in Shapiro's book, takes up
670: about 600 lines, or 13 pages. These ratios are more typical.
671:
672: We suspect that over the coming years both the time it takes to carry
673: out such formalizations, as well as the lengths of the formal proof
674: scripts, will drop significantly. Much of the effort involved in the
675: project was spent on the following:
676: \begin{itemize}
677: \item Defining fundamental concepts and gathering basic libraries of
678: easy facts.
679: \item Proving trivial lemmas and spelling out ``straightforward''
680: inferences.
681: \item Finding the right lemmas and theorems to apply.
682: \item Entering long formulas and expressions correctly, and adapting
683: ordinary mathematical notation to the formal notation in Isabelle.
684: \end{itemize}
685: Gradually, all these requirements will be ameliorated, as better
686: libraries, automated tools, and interfaces are developed. On a
687: personal note, we are entirely convinced that, although there is a
688: long road ahead, formal verification of mathematics will inevitably
689: become commonplace. Getting to that point will require both
690: theoretical and practical ingenuity, but we do not see any conceptual
691: hurdles.\footnote{For further speculation along these lines, see the
692: preliminary notes \cite{avigad:unp:pnt}.}
693:
694: %===================================================================
695:
696: \section{Thoughts on the formalization}
697: \label{reflection:section}
698:
699: In this section, we will discuss features of the formalization that we
700: feel are worthy of discussion, either because they represent novel and
701: successful solutions to general problems, or (more commonly) because
702: they indicate aspects of formal mathematical verification where better
703: support is possible.
704:
705: \subsection{Asymptotics}
706:
707: One of our earliest tasks in the formalization was to develop a
708: library to support the requisite calculations with big O expressions.
709: To that end, we gave the expression $f = O(g)$ the strict reading $\ex
710: C \fa x (|f(x)| \leq C |g(x)|)$, and followed the common practice of
711: taking $O(g)$ to be the set of all functions with the requisite rate
712: of growth, i.e.~
713: \[
714: O(g) = \{ f \st \ex C \fa x (|f(x)| \leq C |g(x)|) \}.
715: \]
716: We then read the ``equality'' in $f = O(g)$ as the element-of
717: relation, $\in$.
718:
719: Note that these expressions make sense for any function type for which
720: the codomain is an ordered ring. Isabelle's axiomatic type classes
721: made it possible to develop the library fully generally. We were able
722: to lift operations like addition and multiplication to such types,
723: defining $f + g$ to denote the pointwise sum, $\lambda x. (f(x) +
724: g(x))$. Similarly, given a set $B$ of elements of a type that supports
725: addition, we defined
726: \[
727: a +_o B = \{ c \st \ex {b \in B} (c = a + b) \}.
728: \]
729: We also defined $a =_o B$ to be alternative input syntax for $a \in
730: B$. This gave expressions like $f =_o g +_o O(h)$ the intended
731: meaning. In mathematical texts, convention dictates that in an
732: expression like $x^2 + 3 x = x^2 + O(x)$, the terms are to be
733: interpreted as functions of $x$; in Isabelle we had to use lambda
734: notation to make this explicit. Thus, the expression above would be
735: entered
736: \begin{center}
737: {\isastyle {\isacharparenleft}{\isasymlambda}x{\isachardot}\
738: x{\isacharcircum}{\isadigit{2}}\ {\isacharplus}\ {\isadigit{3}}\
739: {\isacharasterisk}\ x{\isacharparenright}\ {\isacharequal}o\
740: {\isacharparenleft}{\isasymlambda}x{\isachardot}\
741: x{\isacharcircum}{\isadigit{2}}{\isacharparenright}\ {\isacharplus}o\
742: O{\isacharparenleft}{\isasymlambda}x{\isachardot}\
743: x{\isacharparenright}}
744: \end{center}
745: This should help the reader make sense of the formalizations presented
746: in Section~\ref{formalization:subsection}.
747:
748: An early version of our big O library was presented at IJCAR
749: \cite{avigad:donnelly:04}. That version is nonetheless fairly close to
750: the version used in the proof of the prime number theorem described
751: here, as well as a version that is scheduled for the 2005 release of
752: Isabelle.\footnote{Improvements in the more recent versions include
753: better and more general theorems involving summations, theorems to
754: handle composition of big O equations, and support for reasoning
755: about asymptotic inequalities. Also, in the most recent version, we
756: have dispensed with expressions of the form $O(S)$, where $S$ is a
757: set of functions. It seems that uses of these are easily eliminable,
758: and having $O$ notation for both functions and sets of functions led
759: to annoying type ambiguities.}
760:
761: There is one feature of our library that seems to be less than
762: optimal, and resulted in a good deal of tedium. With our definition, a
763: statement like $\lam {x.} x + 1 = O(\lam {x.} x^2)$ is false when the
764: variables range over the natural numbers, since $x^2$ is equal to $0$
765: when $x$ is $0$. Often one wants to restrict one's attention to
766: strictly positive natural numbers, or nonnegative real numbers. There
767: are four ways one can do this:
768: \begin{itemize}
769: \item Define new types for the strictly positive natural numbers, or
770: nonnegative real numbers, and state the identities for those types.
771: \item Formalize the notion ``$f = O(g)$ on $S$.''
772: \item Formalize the notion ``$f = O(g)$ eventually.''
773: \item Replace $x$ by $x + 1$ in the first case, and by $|x|$ in the
774: second case, to make the identities correct. For example, ``$f(|x|)
775: = O(|x|^3)$'' expresses that $f(x) = O(x^3)$ on the nonnegative
776: reals. Various similar tinkerings are effective; for example, the
777: relationship intended in the example above is probably best
778: expressed as $\lam {x.} x + 1 = O(\lam{x.} x^2 + 1)$.
779: \end{itemize}
780: These various options are discussed in the IJCAR paper
781: \cite{avigad:donnelly:04}, and all come at a cost. For example, the
782: first requires annoying casts, say, between positive natural numbers,
783: and natural numbers. The second requires carrying around a set $S$ in
784: every formula, and both the second and third require additional work
785: when composing expressions or reasoning about sums (roughly, one has
786: to make sure that the range of a function lies in the domain where an
787: asymptotic estimate is valid).
788:
789: In our formalization, we chose the fourth route, which explains the
790: numerous occurrences of $+1$ and $\fn{abs}$ in the statements in
791: Section~\ref{formalization:subsection}. This often made some of the
792: more complex calculations painfully tedious, forcing us, for example,
793: the following ``helper'' lemma in Selberg:
794:
795: \medskip
796:
797: \begin{isabellebody}
798: \isamarkupfalse%
799: \isacommand{lemma}\ aux{\isacharcolon}\ {\isachardoublequote}{\isadigit{1}}\ {\isacharless}{\isacharequal}\ z\ {\isasymLongrightarrow}\ natfloor{\isacharparenleft}abs{\isacharparenleft}z\ {\isacharminus}\ {\isadigit{1}}{\isacharparenright}{\isacharparenright}\ {\isacharplus}\ {\isadigit{1}}\ {\isacharequal}\ natfloor\ z{\isachardoublequote}
800: \end{isabellebody}
801:
802: \medskip
803:
804: \noindent On the general principle that formalization goes most
805: smoothly when the formalization is as close as possible to the
806: informal text, it is probably worth extending the library in the ways
807: described above. We do not have a good sense, however, as to how much
808: this would have simplified our task.
809:
810: Donnelly and Avigad have designed a decision procedure for entailments
811: between linear big O equations, and have obtained a prototype
812: implementation (though we have not incorporated it into the Isabelle
813: framework). This would eliminate the need for helper lemmas like the
814: following:
815:
816: \medskip
817:
818: \begin{isabellebody}
819: \isamarkupfalse%
820: \isacommand{lemma}\ aux{\isadigit{5}}{\isacharcolon}\ {\isachardoublequote}f\ {\isacharplus}\ g\ {\isacharequal}o\ h\ {\isacharplus}o\ O{\isacharparenleft}k{\isacharcolon}{\isacharcolon}{\isacharprime}a{\isacharequal}{\isachargreater}{\isacharparenleft}{\isacharprime}b{\isacharcolon}{\isacharcolon}ordered{\isacharunderscore}ring{\isacharparenright}{\isacharparenright}\ {\isasymLongrightarrow}\isanewline
821: \ \ g\ {\isacharplus}\ l\ {\isacharequal}o\ h\ {\isacharplus}o\ O{\isacharparenleft}k{\isacharparenright}\ {\isasymLongrightarrow}\ f\ {\isacharequal}o\ l\ {\isacharplus}o\ O{\isacharparenleft}k{\isacharparenright}{\isachardoublequote}
822: \end{isabellebody}
823:
824: \medskip
825:
826: \noindent We believe calculations going beyond the linear fragment
827: would also benefit from a better handling of monotonicity, just as is
828: needed to support ordinary calculations with inequalities,
829: as described in the next section.
830:
831: \subsection{Calculations with real numbers}
832: \label{calculation:subsection}
833:
834: One salient feature of the Selberg proof is the amount of calculation
835: involved. The dramatic increase in the length of the formalization of
836: the final part of the proof (5 pages in Shapiro, compared to 89 or so
837: in the formal version) is directly attributable to the need to spell
838: out calculations involving field operations, logarithms and
839: exponentiation, the greatest and least integer functions (``ceiling''
840: and ``floor''), and so on. The textbook calculations themselves were
841: complex; but then each textbook inference had to be expanded, by hand,
842: to what was often a long sequence of entirely straightforward
843: inferences.
844:
845: Of course, Isabelle does provide some automated support. For example,
846: the simplifier employs a form of ordered rewriting for operations,
847: like addition and multiplication, that are associative and
848: commutative. This puts terms involving these operations into canonical
849: normal forms, thereby making it easy to verify equality of terms that
850: differ up to such rewriting. More complex equalities can similarly be
851: obtained by simplifying with appropriate rewrite rules, such as
852: various forms of distributivity in a ring or identities for logarithms
853: and exponents.
854:
855: Much of the work in the final stages of the proof, however, involved
856: verifying \emph{inequalities} between expressions. Isabelle's linear
857: arithmetic package is complete for reasoning about inequalities
858: between linear expressions in the integers and reals, i.e.~validities
859: that depend only on the linear fragment of these theories. But,
860: many of the calculations went just beyond that,
861: at which point we were stuck manipulating expressions by hand and
862: applying low-level inferences.
863:
864: As a simple example, part of one of the long proofs in
865: PrimeNumberTheorem required verifying that
866: \[
867: (1 + \frac{\varepsilon}{3(C + 3)}) \cdot n < K x
868: \]
869: using the following hypotheses:
870: \[
871: \begin{split}
872: & n \leq (K / 2) x \\
873: & 0 < C \\
874: & 0 < \varepsilon < 1
875: \end{split}
876: \]
877: The conclusion is easily obtained by noting that $1 +
878: \frac{\varepsilon}{3(C + 3)}$ is strictly less than $2$, and so the
879: product with $n$ is strictly less than $2 (K / 2) x = K x$.
880: But spelling out the details requires, for one thing, invoking the
881: relevant monotonicity rules for addition, multiplication, and
882: division. The last two, in turn, require verifying that the relevant
883: terms are positive. Furthermore, getting the calculation to go through
884: can require explicitly specifying terms like $2 (K / 2) x$ (which can
885: be simplified to $K x$), or, in other contexts, using rules like
886: associativity or commutativity to manipulate terms into the the forms
887: required by the rules.
888:
889: The file PrimeNumberTheorem consists of a litany of such calculations.
890: This required us to have names like ``mult-left-mono''
891: ``add-pos-nonneg,'' ``order-le-less-trans,'' ``exp-less-cancel-iff,''
892: ``pos-divide-le-eq'' at our fingertips, or to search for them when
893: they were needed. Furthermore, sign calculations had a way of coming
894: back to haunt us. For example, verifying an inequality like $1 / (1 +
895: st) < 1 / (1 + su)$ might require showing that the denominators are
896: positive, which, in turns, might require verifying that $s$, $t$, and
897: $u$ are nonnegative; but then showing $s t > s u$ may again require
898: verifying that $s$ is positive. Since $s$ can be carried along in a
899: chain of inequalities, such queries for sign information can keep
900: coming back. Isar made it easy to break out such facts, name them, and
901: reuse them as needed. But since we were usually working in a context
902: where obtaining the sign information was entirely straightforward,
903: these concerns always felt like an annoying distraction from the
904: interesting and truly difficult parts of the calculations.
905:
906: In short, inferences like the ones we have just described are commonly
907: treated as ``obvious'' in ordinary mathematical texts, and it would be
908: nice if mechanized proof assistants could recognize them as such.
909: Decision procedures that are stronger than linear arithmetic are
910: available; for example, a proof-producing decision procedure for
911: real-closed fields has recently been implemented in HOL-light
912: \cite{mclaughlin:harrison:05}. But for calculations like the one
913: above, computing sequences of partial derivatives, as decision
914: procedures for the real closed fields are required to do, is arguably
915: unnecessary and inefficient. Furthermore, decision procedures for real
916: closed fields cannot be extended, say, to handle exponentiation and
917: logarithms; and adding a generic monotone function, or trigonometric
918: functions, or the floor function, renders the full theory undecidable.
919:
920: Thus, in contexts similar to ours, we expect that principled heuristic
921: procedures will be most effective. Roughly, one simply needs to chain
922: backwards through the obvious rules in a sensible way. There are
923: stumbling blocks, however. For one thing, excessive case splits can
924: lead to exponential blowup; e.g.~one can show $s t > 0$ by
925: showing that $s$ and $t$ are either both strictly positive or strictly
926: negative. Other inferences are similarly nondeterministic: one can
927: show $r + s + t > 0$ by showing that two of the terms are nonnegative
928: and the third is strictly positive, and one can show $r + s < t + u +
929: v + w$, say, by showing $r < u$, $s \leq t + v$, and $0 \leq w$.
930:
931: As far as case splits are concerned, we suspect that they are rarely
932: needed to establish ``obvious'' facts; for example, in
933: straightforward calculations, the necessary sign information is
934: typically available. As far as the second
935: sort of nondeterminism is concerned, notice that the procedures for
936: linear arithmetic are effective in drawing the requisite conclusions
937: from available hypotheses; this is a reflection that of the fact that
938: the theory of the real numbers with addition (and, say, multiplication
939: by rational constants) is decidable.
940:
941: The analogous theory of the reals with multiplication is also
942: decidable. To see this, observe that the structure consisting of the
943: strictly positive real numbers with multiplication is isomorphic to
944: the structure of the real numbers with addition, and so the usual
945: procedures for linear arithmetic carry over. More generally, by
946: introducing case splits on the signs of the basic terms, one can
947: reduce the multiplicative fragment of the reals to the previous case.
948:
949: In short, when the signs of the relevant terms are known, there are
950: straightforward and effective methods of deriving inequalities in the
951: additive and multiplicative fragments. This suggests that what is
952: really needed is a principled method of amalgamating such ``local''
953: procedures, together with, say, procedures that make use of
954: monotonicity and sign properties of logarithms and exponentiation. The
955: well-known Nelson-Oppen procedure provides a method of amalgamating
956: decision procedures for disjoint theories that share only the equality
957: symbol in their common language; but these methods fail for theories
958: that share an inequality symbol when one adds, say, rational constants
959: to the language, which is necessary to render such combinations
960: nontrivial. We believe that there are principled ways, however, of
961: extending the Nelson-Oppen framework to obtain useful heuristic
962: procedures. This possibility is explored in Avigad and Friedman
963: \cite{avigad:friedman:unp}.
964:
965: \subsection{Casting between domains}
966: \label{casting:subsection}
967:
968: In our formalization, we found that the most natural way to establish
969: basic properties of the functions $\theta$, $\psi$, and $\pi$, as well
970: as Chebyshev's theorems, was to treat them as functions from the
971: natural numbers to the reals, rather them as functions from the reals
972: to the reals. Either way, however, it is clear that the relevant
973: proofs have to use the embedding of the natural numbers into the reals
974: in an essential way. Since the $\mu$ function takes positive and
975: negative values, we were also forced to deal with integers as soon
976: as $\mu$ came into play. In short, our proof of the prime number
977: theorem inevitably involved combining reasoning about the natural
978: numbers, integers, and real numbers effectively; and this, in turn,
979: involved frequent casting between the various domains.
980:
981: We tended to address such needs as they arose, in an ad-hoc way. For
982: example, the version of the fundamental theorem of arithmetic that we
983: inherited from prior Isabelle distributions asserts that every
984: positive natural number can be written uniquely as the product of an
985: increasing list of primes. Developing properties of the radical
986: function required being able to express the unique factorization
987: theorem in the more natural form that every positive number is the
988: product of the primes that divide it, raised to the appropriate
989: multiplicity; i.e.~the fact that for every $n > 0$,
990: \[
991: n = \prod_{p | n} p^{\fn{mult}_p(n)},
992: \]
993: where $\fn{mult}_p(n)$ denotes the multiplicity of $p$ in $n$. We also
994: needed, at our disposal, things like the fact that $n$ divides $m$ if
995: and only if for every prime number $p$, the multiplicity of $p$ in $n$
996: is less than or equal to the multiplicity of $p$ in $m$. Thus, early
997: on, we faced the dual tasks of translating the unique factorization
998: theorem from a statement about positive natural numbers to positive
999: integers, and developing a good theory of multiplicity in that
1000: setting. Later, when proving Chebyshev's theorems, we found that we
1001: needed to recast some of the facts about multiplicity to statements
1002: about natural numbers.
1003:
1004: We faced similar headaches when we began serious calculations
1005: involving natural numbers and the reals. In particular, as we
1006: proceeded we were forced to develop a substantial theory of the floor
1007: and ceiling functions, including a theory of their behavior vis-a-vis
1008: the various field operations. In calculations, expressions sometimes
1009: involved objects of all three types, and we often had to explicitly
1010: transport operations in or out of casts in order to apply a relevant
1011: lemma.
1012:
1013: When one extends a domain like the natural numbers to the integers, or
1014: the integers to the real numbers, some operations are simply extended.
1015: For example, properties of addition and multiplication of natural
1016: numbers carry all the way through to the reals. On the other hand, one
1017: has new operations, like subtraction on the integers and division in
1018: the real numbers, that are mirrored imperfectly in the smaller
1019: domains. For example, subtraction on the integers extends truncated
1020: subtraction $x \tsub y$ on the natural numbers only when $x \geq y$,
1021: and division in the reals extends the function $x \mathop{\fn{div}} y$
1022: on the integers or natural numbers only when $y$ divides $x$. Finally,
1023: there are facts that depend on the choice of a left inverse to the
1024: embedding: for example, if $n$ is an integer, $x$ is a real number,
1025: $\fn{real}$ is the embedding of the integers into the reals, and
1026: $\lfloor \cdot \rfloor$ denotes the floor function from the reals to
1027: the integers, we have
1028: \[
1029: (n \leq \lfloor x \rfloor) \equiv (\fn{real}(n) \leq x).
1030: \]
1031: This is an example of what mathematicians call a
1032: Galois correspondence, and category theorists call an adjunction,
1033: between the integers and the real numbers with the ordering relation.
1034:
1035: Our formalization of the prime number theorem involved a good deal of
1036: manipulation of expressions, by hand, using the three types of facts
1037: just described. Many of these inferences should be handled
1038: automatically. After all, such issues are transparent in mathematical
1039: texts; we carry out the necessary inferences smoothly and
1040: unconsciously whenever we read an ordinary proof. The guiding
1041: principle should be that anything that is transparent to us can be
1042: made transparent to a mechanized proof assistant: we simply need to
1043: reflect on \emph{why} we are effectively able to combine domains in
1044: ordinary mathematical reasoning, and codify that knowledge
1045: appropriately.
1046:
1047: \subsection{Combinatorial reasoning with sums}
1048:
1049: As described in Section~\ref{formalization:subsection}, formalizing
1050: the prime number theorem involved a good deal of combinatorial
1051: reasoning with sums and products. Thus, we had to develop some basic
1052: theorems to support such reasoning, many of which have since been
1053: moved into Isabelle's HOL library. These include, for example,
1054:
1055: \medskip
1056:
1057: \begin{isabellebody}
1058: \isamarkupfalse%
1059: \isacommand{lemma}\ setsum{\isacharunderscore}cartesian{\isacharunderscore}product{\isacharcolon}\ \isanewline
1060: \ \ \ {\isachardoublequote}{\isacharparenleft}{\isasymSum}x{\isasymin}A{\isachardot}\ {\isacharparenleft}{\isasymSum}y{\isasymin}B{\isachardot}\ f\ x\ y{\isacharparenright}{\isacharparenright}\ {\isacharequal}\ {\isacharparenleft}{\isasymSum}z{\isasymin}A\ {\isacharless}{\isacharasterisk}{\isachargreater}\ B{\isachardot}\ f\ {\isacharparenleft}fst\ z{\isacharparenright}\ {\isacharparenleft}snd\ z{\isacharparenright}{\isacharparenright}{\isachardoublequote}
1061: \end{isabellebody}
1062:
1063: \medskip
1064:
1065: \noindent which allows one to view a double summation as a sum over a
1066: cartesian product. A more interesting example is
1067:
1068: \medskip
1069:
1070: \begin{isabellebody}
1071: \isamarkupfalse%
1072: \isacommand{lemma}\ setsum{\isacharunderscore}reindex{\isacharcolon}\isanewline
1073: \ \ \ \ \ {\isachardoublequote}inj{\isacharunderscore}on\ f\ B\ {\isasymLongrightarrow}\ {\isacharparenleft}{\isasymSum}x{\isasymin}f{\isacharbackquote}B{\isachardot}\ h\ x{\isacharparenright}\ {\isacharequal}\ {\isacharparenleft}{\isasymSum}x{\isasymin}B{\isachardot}\ {\isacharparenleft}h\ {\isasymcirc}\ f{\isacharparenright}{\isacharparenleft}x{\isacharparenright}{\isacharparenright}{\isachardoublequote}
1074: \end{isabellebody}
1075:
1076: \medskip
1077:
1078: \noindent which expresses that if $f$ is an injective function on a
1079: set $B$, then summing $h$ over the image of $B$ under $f$ is
1080: the same as summing $h \circ f$ over $B$. In particular, if $f$ is a
1081: bijection from $B$ to $A$, the second identity implies that summing
1082: $h$ over $A$ is the same as summing $h \circ f$ over $B$. This type of
1083: ``reindexing'' is often so transparent in mathematical arguments that
1084: when we first came across an instance where we needed it (long ago,
1085: when proving quadratic reciprocity), it took some thought to identify
1086: the relevant principle. It is needed, for example, to show
1087: \[
1088: \sum_{d | n} h(n) = \sum_{d | n} h(n / d),
1089: \]
1090: using the fact that $f(d) = n / d$ is a bijection from the set of
1091: divisors of $n$ to itself; or, for example, to show
1092: \[
1093: \sum_{d d' = c} h(d, d') = \sum_{d | c} h(d, c/d),
1094: \]
1095: using the fact that $f(d) = \la d, c /d \ra$ is a bijection from the
1096: set of divisors of $c$ to $\{ \la d, d' \ra \st d d' = c \}$. The
1097: reindexing lemma is a discrete analogue of integration by
1098: substitution, so it is likely that methods developed to
1099: support such inferences will be more generally useful.
1100:
1101: In Isabelle, if $\sigma$ is any type, then $\fn{\sigma} \; set$ denotes
1102: the type of all subsets of $\sigma$. The predicate ``finite'' is
1103: defined inductively for these subset types. Isabelle's summation
1104: operator takes a subset $A$ of $\sigma$ and a function $f$ from
1105: $\sigma$ to any type with an appropriate notion of addition, and
1106: returns $\sum_{x \in A} f(x)$. This summation operator really only
1107: makes sense when $A$ is a finite subset, so many identities have to be
1108: restricted accordingly. (An alternative would be to define a type of
1109: finite subsets of $\sigma$, with appropriate closure operations; but
1110: then work would be required to translate properties of arbitrary
1111: subsets to properties of finite subsets, or to mediate relationships
1112: between finite subsets and arbitrary subsets.) This has the net effect
1113: that applying an identity involving a sum or product often requires
1114: one to verify that the relevant sets are finite. This difficulty is
1115: ameliorated by defining $\sum_{x \in A} f(x)$ to be $0$ when $A$ is
1116: infinite, since it then turns out that a number of identities hold in
1117: the unrestricted form. But this fix is not universal, and so
1118: finiteness issues tend to pop up repeatedly when one carries out a
1119: long calculation.
1120:
1121: In short, at present, carrying out combinatorial calculations often
1122: requires a number of straightforward verifications involving
1123: reindexing and finiteness. Once again, these are inferences that are
1124: nearly transparent in ordinary mathematical texts, and so, by our
1125: general principle, we should expect mechanized proof assistants to
1126: take care of them. As before, there are stumbling blocks; for example,
1127: when reindexing is needed, the appropriate injection $f$ has to be
1128: pulled from the air. We expect, however, that in the types of
1129: inferences that are commonly viewed as obvious, there are natural
1130: candidates for $f$. So this is yet another domain where reflection and
1131: empirical work should allow us to make proof assistants more usable.
1132:
1133: \subsection{Devising elementary proofs}
1134:
1135: Anyone who has undertaken serious work in formal mathematical
1136: verification has faced the task of adapting an ordinary mathematical
1137: proof so that it can be carried out using the libraries and resources
1138: available. When a proof uses mathematical ``machinery'' that is
1139: unavailable, one is faced with the choice of expanding the background
1140: libraries to the point where one can take the original proof at face
1141: value, or finding workarounds, say, by replacing the original
1142: arguments with ones that are more elementary. The need to rewrite
1143: proofs in such a way can be frustrating, but the task can also be
1144: oddly enjoyable: it poses interesting puzzles, and enables one to
1145: better understand the relationship of the advanced mathematical
1146: methods to the elementary substitutes. As more powerful mathematical
1147: libraries are developed, the need for elementary workarounds will
1148: gradually fade, and with it, alas, one good reason for investing time
1149: in such exercises.
1150:
1151: Our decision to use Selberg's proof rather than a complex-analytic one
1152: is an instance of this phenomenon. To this day, we do not have a sense
1153: of how long it would have taken to build up a complex-analysis library
1154: sufficient to formalize one of the more common proofs of the prime
1155: number theorem, nor how much easier a formal verification of the prime
1156: number theorem would have been in the presence of such a library.
1157:
1158: But similar issues arose even with respect to the mild uses of
1159: analysis required by the Selberg proof. Isabelle's real library gave
1160: us a good theory of limits, series, derivatives, and the basic
1161: transcendental functions, but it had almost no theory of integration
1162: to speak of. Rather than develop such a theory, we found that we were
1163: able to work around the mild uses of integration needed in the Selberg
1164: proof.\footnote{Since the project began, Sebastian Skalberg managed to
1165: import the more extensive analysis library from the HOL theorem
1166: prover to Isabelle. By the time that happened though, we had already
1167: worked around most of the applications of analysis needed for the
1168: proof.} Often, we also had to search for quick patches to other gaps
1169: in the underlying library. For the reader's edification and
1170: entertainment, we describe a few such workarounds here.
1171:
1172: Recall that one of the fundamental identities we needed asserts
1173: \[
1174: \ln(1 + 1/n) = 1/n + O(1/n^2).
1175: \]
1176: This follows from the fact that $\ln(1 + x)$ is well approximated by
1177: $x$ when $x$ is small, which, in turn, can be seen from the
1178: Maclaurin series for $\ln(1 + x)$, or even the fact that the
1179: derivative of $\ln(1 + x)$ is equal to $1$ at $0$. But these were
1180: among the few elementary properties of transcendental functions that
1181: were missing from the real library. How could we work around this?
1182:
1183: To be more specific: Fleuriot's real library defined $e^x$ by the
1184: power series $e^x = \sum_{n = 0}^{\infty} x^n / n!$, and showed that
1185: $e^x$ is strictly increasing, $e^0 = 1$, $e^{x + y} = e^x e^y$ for
1186: every $x$ and $y$, and the range of $e^x$ is exactly the set of
1187: positive reals. The library then defines $\ln$ to be a left inverse to
1188: $e^x$. The puzzle was to use these facts to show that $| \ln(1 + x) -
1189: x| \leq x^2$ when $x$ is positive and small enough.
1190:
1191: Here is the solution we hit upon. First, note that when $x \geq 0$,
1192: $e^x \geq 1 + x$, and so, $x \geq \ln(1 + x)$. Replacing $x$ by $x^2$,
1193: we also have
1194: \begin{equation}
1195: \label{e:eq:one}
1196: e^{x^2} \geq 1 + x^2.
1197: \end{equation}
1198: On the other hand, the definition of $e^x$ can be used to show
1199: \begin{equation}
1200: \label{e:eq:two}
1201: e^x \leq 1 + x + x^2
1202: \end{equation}
1203: when $0 \leq x \leq 1/2$. From (\ref{e:eq:one}) and (\ref{e:eq:two})
1204: we have
1205: \[
1206: \begin{split}
1207: e^{x - x^2} & = e^x / e^{x^2} \\
1208: & \leq (1 + x + x^2) / (1 + x^2) \\
1209: & \leq 1 + x,
1210: \end{split}
1211: \]
1212: where the last inequality is easily obtained by multiplying
1213: through. Taking logarithms of both sides, we have
1214: \[
1215: x - x^2 \leq \ln(1 + x) \leq x
1216: \]
1217: when $0 \leq x \leq 1/2$, as required. In fact, a similar calculation
1218: yields bounds on $\ln(1 + x)$ when $x$ is negative and close to
1219: $0$. This can be used to show that the derivative of $\ln x$ is $1 /
1220: x$; the details are left to the reader.
1221:
1222: For another example, consider the problem of showing that $\sum_{n =
1223: 1}^\infty 1 / n^2$ converges. This follows immediately from the
1224: integral test: $\sum_{n = 1}^\infty 1 / n^2 \leq \int_1^\infty 1 / x^2
1225: = 1$. How can it be obtained otherwise? Answer: simply write
1226: \begin{eqnarray*}
1227: \sum_{n = 1}^M 1 / n^2 & \leq & 1 + \sum_{n = 2}^M 1 / n (n - 1) \\
1228: & = & 1 + \sum_{n = 2}^M (1 / (n -1 ) - 1 / n) \\
1229: & = & 1 + 1 - 1/M \\
1230: & \leq & 2,
1231: \end{eqnarray*}
1232: where the second equality relies on the fact that the preceding
1233: expression involves a telescoping sum. Having to stop frequently to
1234: work out puzzles like these helped us appreciate the immense power of
1235: the Newton-Leibniz calculus, which provides uniform and mechanical
1236: methods for solving such problems. The reader may wish to consider
1237: what can be done to show that the sum $\sum_{n = 1}^\infty 1 / x^a$ is
1238: convergent for general values of $a > 1$, or even for the special case
1239: $a = 3 / 2$. Fortunately, we did not need these facts.
1240:
1241: Now consider the identity
1242: \[
1243: \sum_{n \leq x} 1 / n = \ln x +
1244: O(1).
1245: \]
1246: To obtain this, note that when $x$ is positive integer we can write
1247: $\ln x$ as a telescoping sum,
1248: \[
1249: \begin{split}
1250: \ln x & = \sum_{n \leq x - 1} (\ln (n + 1) - \ln n) \\
1251: & = \sum_{n \leq x - 1} \ln(1 + 1 / n) \\
1252: & = \sum_{n \leq x - 1} 1 / n + O(\sum_{n \leq x} 1 / n^2) \\
1253: & = \sum_{n \leq x} 1 / n + O(1).
1254: \end{split}
1255: \]
1256: We learned this trick from Cornaros and Dimitracopoulos
1257: \cite{cornaros:dimitracopoulos:94}. In fact, a slight refinement of
1258: the argument shows
1259: \[
1260: \sum_{n \leq x} 1 /n = \ln x + C + O(1/x)
1261: \]
1262: for some constant, $C$. This constant is commonly known as Euler's
1263: constant, denoted by $\gamma$.
1264:
1265: One last puzzle: how can one show that $\ln x / x^a$ approaches $0$,
1266: for any $a > 0$? Here is our solution. First, note that we have $\ln x
1267: \leq \ln(1 + x) \leq x$ for every positive $x$. Thus we have
1268: \[
1269: a \ln x = \ln x^a \leq x^a,
1270: \]
1271: for every positive $x$ and $a$. Replacing $a$ by $a / 2$ and dividing
1272: both sides by $a x^a / 2$, we obtain $\ln x / x^a \leq 2 /
1273: (ax^{a/2})$. It is then easy to show that the right-hand-side
1274: approaches $0$ as $x$ approaches infinity.
1275:
1276: \section{Conclusions}
1277: \label{conclusions:section}
1278:
1279: Our efforts show that formal verification of significant mathematical
1280: theorems is possible, although more work is needed before the practice
1281: is likely to become widespread. In an ideal situation, it would be
1282: possible to enter mathematical text almost exactly as it appears in a
1283: careful and precise informal presentation, and interactive proof
1284: systems would be able to verify inferences at that level. Our
1285: formalization of the prime number theorem provides a case study that
1286: clarifies some of the ways in which the current technology
1287: falls short of the ideal.
1288:
1289: The formal statements of theorems in
1290: Section~\ref{formalization:subsection} are notably less attractice
1291: than their informal counterparts in Section~\ref{selberg:subsection}.
1292: The difference is not merely cosmetic; notation is an integral part of
1293: mathematics, and it is unreasonable to expect the mathematical
1294: community to make notational sacrafices for mechanical convenience.
1295: Integrating formal verification into mathematical practice will
1296: therefore require us to take ordinary mathematical notation extremely
1297: seriously.
1298:
1299: The biggest obstacle at present is the gap between those inferences
1300: that ordinary mathematicians recognize as obvious, and those that can
1301: be verified automatically by conventional proof assistants. We have
1302: suggested one strategy for improvement, namely, to reflect on the
1303: capacities that enable us, in specific domains, to verify textbook
1304: inferences, and then to formalize that understanding. In particular,
1305: it seems that fairly straightforward support for reasoning about
1306: inequalities between real numbers and casts between integers and real
1307: numbers would have simplified our task substantially.
1308:
1309: Progress in formal verification will require a broad but focused
1310: philosophical reflection on ordinary mathematical practice, together
1311: with robust formal characterizations of that practice and sound
1312: engineering. As such, the field represents an auspicious combination
1313: of theory and practice.
1314:
1315: \bigskip
1316:
1317: \noindent {\bf Acknowledgements.} We are especially grateful to Tobias
1318: Nipkow and Larry Paulson for continued support on this project. We are
1319: also grateful to them, and to Freek Wiedijk and two anonymous
1320: referees, for comments on this paper.
1321:
1322: \bibliographystyle{plain}
1323: \bibliography{proofthry,numbertheory,automated}
1324:
1325:
1326: \end{document}
1327:
1328: %%% Local Variables:
1329: %%% mode: latex
1330: %%% TeX-master: t
1331: %%% End:
1332: