This is a rough transcription of a talk I gave to a class of algebraic number theory students at UC Berkeley with the goal of trying to understand how one might bring to bear modern techniques in number theory/geometry on some classical questions. I have essentially kept the format the same, while adding a bit of extra material (and adding in their responses to questions I asked).

A warning

This talk was difficult to write for multiple reasons. Most relevant to the reader though was the unknown, and likely highly variable, backgrounds of the audience members. For this reason I chose to keep things as simple as possible and consequently, in the process, acquiesced quite a bit of rigor. I think that these mistakes are not so bad since it seems, to me, that any reader capable of identifying mistakes is likely able to see how to fix them. But, again, there are some informalities/inaccuraces.

What is the point?

So, everyone learning algebraic number theory for the first time is, ostensibly, interested in studying, well, number theory. An innocent enough statement, but it begs a much more complicated question: what is number theory? Namely, if a well-educated student of analysis (or even an algebraic geometer!) asked a number theorist what their subject was all about, what should the response be?

This is surprisingly non-obvious question. The answer largely lies in the way in which number theory distinguishes itself from many other subjects of mathematics. I think Hida has said it best (paraphrasing): number theory is a subject not determined by its methods, but by its desiderata—the problems it wants to solve. As an example, one can think of analysis as being the study of objects amenable to the method of limiting processes. Number theory has no such method from which its problems spring. Instead, conversely, its methods (as wide and diverse as they are) are determined instead by the type of problems that it wants to solve.

So, let us rephrase the question: what things does number theory seek to solve/understand?

Remark: At this point I opened up this question to the audience—legitimately interested in what they might say. Unfortunately, I didn’t really get any real response.

The fact that none of you responded is, in my opinion, somewhat typical. Number theory is plagued by a serious discrepancy between its historical perspectives and its modern ones. For example, here are some ‘classical’ answers to the above question:

Number theory is the study of primes and their distribution.
Number theory is the study of cryptographical systems.
Number theory is the study of Diophantine equations.

All of these perspectives are emphasized, for example, in a first undergraduate course on number theory and are what the lay mathematical student (someone that is not interested particularly in number theory) might say.

That said, while these are the classical perspectives on the goals of the subject, things have changed. One more modern perspective on what the ‘goal’ of number theory might be is the following:

Number theory is the study of the absolute Galois group $G_{\mathbb{Q}}=\text{Gal}(\overline{\mathbb{Q}}/\mathbb{Q})$ as a (topological) group and the study of its representations.

For example, the all-consuming web of conjectures known as the Langlands program (what both me and your teacher, my adviser, study!) is concerned with relating the representations of $G_{\mathbb{Q}}$ to algebraic geometry and (harmonic) analysis.

Moreover, the methods and techniques of the course you are currently taking are focused (implicitly) on this more modern perspective. Indeed, the usual big reveal of a first course in algebraic number theory are the results umbrellaed under the name Class Field Theory which, in all actuality, are just a summary of the character theory of $G_{\mathbb{Q}}$ and (implicitly) relating such characters to objects of an algebro-geometric and (harmonic) analytic nature.

This all being said, it is somewhat perverse (as well as a bit jarring) to wholeheartedly embrace this modern perspective without understanding its connections to the more elementary, naive questions that gave birth to it. Less pompously, it’s a bit of a shock to see how different the seeming goals and methods of an undergraduate course in number theory differ from that in a graduate course (let alone the cutting-edge research in the field).

Thus, the goal of today is to try and give some indication as to how this modern perspective of algebraic geometry and the study of $G_{\mathbb{Q}}$ and its actions (representations are just linearized actions!) actually aid the study of the most fundamental of the above ‘classical’ subjects: Diophantine equations.

Why Diophantine equations?

Before we undertake our journey in earnest, let us begin by explaining briefly why Diophantine equations are of interest to a modern number theorist.

We begin, as we should, by recalling the definition of a Diophantine equation. Namely, a Diophantine equation is an equation of the form $f(x_1,\ldots,x_n)=0$ with $f(T_1,\ldots,T_n)\in\mathbb{Z}[T_1,\ldots,T_n]$ and where we require that $(x_1,\ldots,x_n)\in\mathbb{Z}^n$ . In words, Diophantine equations are the study of the integer roots of integer polynomials. Of course, even though we’ve chosen to focus on a single equation, one should consider simultaneous integer solutions to a family of integral polynomials as Diophantine equations as well.

Let us now give some classic examples of Diophantine equations, in increasing level of difficulty, and roughly how these equations are tackled:

Pell’s equations: $x^2-ny^2=1$ where $n\in\mathbb{Z}$ is some fixed square-free integer. Such equations are taken care of, quite neatly, by the study of continued fractions and the study of the units of the ring $\mathbb{Z}[\sqrt{n}]$ (which is the ring of integers of $\mathbb{Q}(\sqrt{n})$ if $n=2,3\mod 4$ ).
Catalan’s equation: $x^a-y^b=1$ where $a,b\in\mathbb{N}$ are fixed. This has the only solution $(a,b,x,y)=(3,2,2,3)$ (incredible!). This was solved by Mihalescu in 2002 using an incredibly clever argument which does not use much more than the number theory learned in this course.
Fermat’s equation: $x^n+y^n=z^n$ with $n\in\mathbb{N}$ fixed. The solutions for $n=1$ and $n=2$ are explicitly parameterizable (see below!), and for $n>2$ there are no non-trivial solutions (i.e. solutions where none of $x,y,$ or $z$ is zero). This was finally proven in the mid ’90s thanks to a huge number of people, most notably being Wiles, Ribet, Frey-Hellegourach, and Mazur. The solution was a tour de force of modern arithmetic geometry, which relied most pivotally on proving a small case of the aforementioned Langlands program.

All of these are incredibly interesting—Catalan says that no two consecutive integers (save $8$ and $9$ ) are perfect powers, and Fermat’s equation says that (except for $n=1$ and $n=2$ ) the sum of two $n^\text{th}$ -powers cannot be an $n^\text{th}$ -power—but why are Diophantine equations interesting in general? Why are they worth trying to study systematically, and not focusing on particular equations of interest?

Well, to begin with let’s drop all pretense and give what might be the most obvious answer to ‘what is the point of number theory?’: it’s the study of the integers $\mathbb{Z}$ . But, the study of $\mathbb{Z}$ with what structure? Depending on what your goal is the answer might be as an ordered ring but, for most purposes, the real goal is just to study $\mathbb{Z}$ as a plain ring. How then do Diophantine equations help us in this goal? The (soft) answer lies in the classic theorem of Yoneda. Namely, recall that the Yoneda philosophy tells us that if we want to study $\mathbb{Z}$ as a ring, we should study the sets $\text{Hom}_{\mathsf{Ring}}(R,\mathbb{Z})$ for all rings $R$ . Well, any ring $R$ looks like $\mathbb{Z}[x_i]/(f_j)$ for some (possibly gigantic) set of variables $x_i$ and equations $f_j$ . The set $\text{Hom}_{\mathsf{Ring}}(R,\mathbb{Z})$ is nothing more than the set of solutions to the diophantine equations $f_j=0$ . Thus, the study of Diophantine equations, if you buy into the Yoneda philosophy, is equivalent to the study of $\mathbb{Z}$ as a ring.

Remark: Of course, the above is patently imprecise. Namely, the key aspect of Yoneda’s lemma is that you don’t only know $\text{Hom}_{\mathsf{Ring}}(R,\mathbb{Z})$ as sets for all $R$ but that you actually know the functor $\text{Hom}_{\mathsf{Ring}}(-,\mathbb{Z})$ or, in other words, how the solutions to these Diophantine equations all relate. Of course, this talk was not meant to be that precise in the first place.

What would a systematic study look like?

Now that we have (hopefully) convinced ourselves that Diophantine equations are worth our study, we need to decide how to systematically study Diophantine equations. Indeed, what the last part of the previous section told us is that to really understand $\mathbb{Z}$ we can’t study specific Diophantine equations (like the three listed above), we study them ALL. But, as you’ll notice, all the specific Diophantine equations above had very specific means of attacking them. If we hope to say anything general we thus need to develop a systematic way of studying Diophantine equations. But, what would this look like?

Let us begin by introducing a little bit of notation. Namely, if $f(T_1,\ldots,T_n)\in\mathbb{Z}[T_1,\ldots,T_n]$ and $R$ is any ring, let us denote by $X_f(R)$ the following set:

$X_f(R):=\{(x_1,\ldots,x_n)\in R^n:f(x_1,\ldots,x_n)=0\}$

This terminology seems, at the start, to be nothing more than a convenient tool to discuss polynomial solutions. That said, just like the innocuousness of the Lagrange symbol, this simplicity is deceptive as the shift of thinking of the polynomial solutions for one ring (e.g. $R=\mathbb{Z}$ ) to all rings is an epoch changing maneuver the surface of which we’ll only just scratch.

But, let’s back off from that highfalutin nonsense for a second. Namely, we said that we are interested in studying Diophantine equations and thus, really, we’re interested in studying sets of the form $X_f(\mathbb{Z})$ . The issue, of course, is that this is hard. The reason this is hard is that sets are so unstructured. A hallmark of mathematics is to exploit the extra structure of an object. Sets, unfortunately, do not have much structure. So, our first order of business will be to replace this highly unstructured set $X_f(\mathbb{Z})$ with an object for which we will have much more structure to twiddle around with.

To this end, let us begin by replacing $X_f(\mathbb{Z})$ with something slightly larger. Namely, we replace $X_f(\mathbb{Z})$ by $X_f(\mathbb{Q})$ . We do this mostly for matters of simplification (related to the fact that $\mathbb{Q}$ is a ‘simpler ring’ than $\mathbb{Z}$ ) but for the types of $f$ we care about we’ll see that the containment $X_f(\mathbb{Z})\subseteq X_f(\mathbb{Q})$ is essentially an equality. That said, while $X_f(\mathbb{Q})$ might be ostensibly nicer, it’s still just a set, and therefore we still need to make a leap to give us something more amenable to study.

To this end, we replace $X_f(\mathbb{Q})$ by an even larger set: the set $X_f(\overline{\mathbb{Q}})$ . Now, again, this seems like we’re in the same sort of unstructured territory we’re so desperately trying to escape but, in fact, we’re not. Indeed, the set $X_f(\overline{\mathbb{Q}})$ comes with something fairly sophisticated: a Galois action. Indeed, since $f$ has rational (integral) coefficients if $(x_1,\ldots,x_n)\in X_f(\overline{\mathbb{Q}})$ then

$\sigma\cdot(x_1,\ldots,x_n):=(\sigma(x_1),\ldots,\sigma(x_n))$

is in $X_f(\overline{\mathbb{Q}})$ for any $\sigma\in G_{\mathbb{Q}}$ . In this way we obtain a $G_{\mathbb{Q}}$ -action on $X_f(\overline{\mathbb{Q}})$ . Moreover, the topological structure of $G_{\mathbb{Q}}$ is not ignored, in the sense that the action of $G_{\mathbb{Q}}$ on $X_f(\overline{\mathbb{Q}})$ is continuous (when $X_f(\overline{\mathbb{Q}})$ is given the discrete topology). And while the passage from $X_f(\mathbb{Z})$ to $X_f(\mathbb{Q})$ might, in general, be ‘lossy’ (one can’t necessarily recover the former from the latter) the passage from $X_f(\mathbb{Q})$ to the continuous $G_{\mathbb{Q}}$ -set $X_f(\overline{\mathbb{Q}})$ is not: $X_f(\mathbb{Q})=X_f(\overline{\mathbb{Q}})^{G_{\mathbb{Q}}}$ (where the superscript denotes fixed points).

Remark: In a way that one can make pretty precise, the above step is like studying a topological space $X$ by studying the space $\widetilde{X}$ (its universal cover) with its associated $\pi_1(X)$ action. The claim about fixed points becomes the claim that a $\pi_1(X)$ -equivariant map $\widetilde{X}\to Y$ descends uniquely to a map $X\to Y$ .

Thus, we see that we’ve already passed from something incredibly unstructured (the set $X_f(\mathbb{Q})$ ) to something with an immense amount of structure (the continuous $G_{\mathbb{Q}}$ -set $X_f(\overline{\mathbb{Q}}$ ). But, before we continue, let’s pause to consider what is, perhaps, the simplest example.

Namely, let’s suppose that $f(T)\in\mathbb{Z}[T]$ , so that $f$ is a univariate polynomial. What then does the $G_{\mathbb{Q}}$ -set $X_f(\overline{\mathbb{Q}})$ look like? Well, it’s clear that $X_f(\overline{\mathbb{Q}})$ is a finite, discrete set with a $G_{\mathbb{Q}}$ -action. Moreover, we can describe precisely the orbit structure of this action. Namely, if $f$ factors over $\mathbb{Q}[T]$ as $f_1(T)^{e_1}\cdots f_m(T)^{e_m}$ with $f_i$ distinct irreducibles of degree $d_i$ , then $\# X_f(\overline{\mathbb{Q}})$ will be $N:=d_1+\cdots+d_m$ and the orbits of the $G_{\mathbb{Q}}$ -action on $X_f(\overline{\mathbb{Q}})$ will be precisely the sets $X_{f_i}(\overline{\mathbb{Q}})$ as $i$ varies.

We can soup this picture up even more. Namely, the way that $G_{\mathbb{Q}}$ acts on this finite set gives a continuous homomorphism

$G_{\mathbb{Q}}\to \text{Sym}(X_f(\overline{\mathbb{Q}}))\cong S_N$

which evidently factors through an embedded copy of $S_{d_1}\times\cdots \times S_{d_m}$ corresponding to the orbit decomposition of the $G_{\mathbb{Q}}$ -set described above. We can soup this up even further. Namely, we can take the standard/tautological permutation

$\rho_f:G_{\mathbb{Q}}\to \text{GL}_N(\mathbb{C})$

Less crpytically, letting $\mathbb{C}^N$ have basis $\{e_x\}_{x\in X_f(\overline{\mathbb{Q}})}$ we get the representation $\rho_f$ by declaring that $\rho_f(\sigma)(e_x)=e_{\sigma(x)}$ .

Remark: For those that know what this means, the above representation $\rho_f$ might have a more familiar form. Namely, consider $X_f$ as the scheme $\text{Spec}(\mathbb{Q}[T]/(f(T)))$ and choose an isomorphism $\overline{\mathbb{Q}_\ell}\cong\mathbb{C}$ . Then, under this isomorphism the above representation $\rho_f$ is nothing more than $H^0(X_f,\overline{\mathbb{Q}_\ell})$ (the zeroth $\ell$ -adic cohomology of $X_f$ ).

This representation contains all the essential information about $f$ —its irreducible factors and so, in particular, its rational roots (what we’re really after!). But, moreover, the representation $\rho_f$ provides an incredible tool to study the extension $\mathbb{Q}_f:=\mathbb{Q}(X_f(\overline{\mathbb{Q}}))$ . Indeed, it’s clear that $\rho_f$ actually factors over $\text{Gal}(\mathbb{Q}_f/\mathbb{Q})$ and, in fact, gives a faithful (injective) representation $\rho_f:\text{Gal}(\mathbb{Q}_f/\mathbb{Q})\to\text{GL}_N(\mathbb{C})$ . One can then try to completely understand properties of the extension $\mathbb{Q}_f/\mathbb{Q}$ by studying this representation $\rho_f$ . For example, we can entirely characterize the set $\text{Spl}(\mathbb{Q}_f/\mathbb{Q})$ of split primes (which completely characterizes $\mathbb{Q}_f$ by Chebotarev density) as follows:

$\text{Spl}(\mathbb{Q}_f/\mathbb{Q})=\{p:\rho_f(\text{Frob}_p)=I_N\}=\{p:\text{tr}(\rho_f(\text{Frob}_p))=N\}$

where, of course, the above only really makes sense for unramified $p$ .

This is great. Namely, it allows us to study the $G_{\mathbb{Q}}$ -set $X_f(\overline{\mathbb{Q}})$ and the extension $\mathbb{Q}_f/\mathbb{Q}$ using the incredibly rich theory of representations of finite groups (namely representations of the group $\text{Gal}(\mathbb{Q}_f/\mathbb{Q})$ ). Thus, at least in this case, we’ve eschewed the bonds of unstructured sets and donned the incredibly powerful wears of finite group representation theory—quite the upgrade!

Unfortunately, the above doesn’t quite work for general $f$ (i.e. not univariate $f$ ). Namely, we can still pass from the set $X_f(\mathbb{Q})$ to the continuous $G_{\mathbb{Q}}$ -set $X_f(\overline{\mathbb{Q}})$ , but the next step is bound for failure in general. Namely, we built this representation $\rho_f$ from the tautological representation of $S_N\cong\text{Sym}(X_f(\overline{\mathbb{Q}}))$ , and for more general $f$ this set $X_f(\overline{\mathbb{Q}})$ is going to be (countably) infinite. And, as one knows, the study of infinite-dimensional representations isn’t very useful/tractable without more structure. We need to develop a more sophisticated technique if we hope to proceed further.

Remark: One of the students in the class asked “Why not try to study these infinite dimensional representations using techniques of functional analysis?” To this I responded as follows. This is not a terrible idea, but the naive approach is not going to work. Namely, if we want to study representations into things like Hilbert spaces, we generally need our group to be sufficiently ‘analytic’. So, for example, one can study such Hilbert space representations for things like Lie groups since, after all, both are objects of a complex analytic nature, and thus are likely to have something interesting to say to one another. Trying this for our group $G_{\mathbb{Q}}$ is bound to fail in a literal sense since $G_{\mathbb{Q}}$ is ‘anti-analytic’. That said, trying to study $G_{\mathbb{Q}}$ by studying Hilbert space (with a small caveat) representations of a related group is the entire premise of the Langlands program!

So, if we’re not going to be able to use the naive definition of representations $\rho_f$ for general $f$ , then how shall we proceed? We begin by making an observation about one of the benefits that having very complicated $X_f(\overline{\mathbb{Q}})$ gives us. As an example, let’s consider the polynomial $f(T_1,T_2)=T_1^2+T_2-1\in\mathbb{Z}[T_1,T_2]$ . Then, as said above, $X_f(\overline{\mathbb{Q}})$ is infinite. But, something spectacular happens here. Namely, if we enlarge even further something magical appears. Namely, if we replace $X_f(\overline{\mathbb{Q}})$ with $X_f(\mathbb{C})$ a clear-cut advantage of this case over the univariate case occurs. Namely, in the univariate case the $\mathbb{C}$ -points are still just a discrete set of points. But, in the case of our current $f$ we get that $X_f(\mathbb{C})$ is $\mathbb{C}-\{p,q\}$ —the twice punctured plane. A gloriously rich topological object.

To this end, for any $f(T_1,\ldots,T_n)\in\mathbb{Z}[T_1,\ldots,T_n]$ , let us give $X_f(\mathbb{C})$ the structure of a topological space by considering it as a subspace of $\mathbb{C}^n$ with the obvious embedding $X_f(\mathbb{C})\subseteq\mathbb{C}^n$ . Then, we see that what non-univariate polynomials lack in the naive amenability to representation theory, they make up for in having rich, intrinsic geometric structure.

With all of this structure uncovered, we can say (very broadly) how the systematic study of Diophantine equations goes. Namely, given $f$ we study $X_f(\mathbb{Z})$ through the continuous $G_{\mathbb{Q}}$ -set $X_f(\overline{\mathbb{Q}})$ , the geometry of $X_f(\mathbb{C})$ , and their interaction. This then creates a subject where one studies Diophantine equations by a mix of number theory and algebraic geometry which, in modern parlance, would be called arithmetic geometry.

We will come back to how this all relates to studying $G_{\mathbb{Q}}$ and its representations. But, before we do, we’d like to give a concrete example of using a mix of geometry and number theory to study a particular family of Diophantine equations.

An extended example

The example

So, as we said at the end of last section, we seek now to exploit our newfound methodology to study Diophantine equations to at least indicate how a natural class of such equations can be studied. What class of equations? Well, let us remark that in basic number theory the amount of honest Diophantine equations one solves is remarkably slim. To wit, this past summer I taught an elementary number theory course at UC Berkeley, and the only class of Diophantine equations I was able to solve was linear Diophantine equations in any number of variables. This is exceedingly simple, and essentially comes down to the Euclidean algorithm. The rest of the term was then, in fact, not studying Diophantine equations but ‘models’ for Diophantine equations: equations over finite fields.

Thus, any non-linear family of Diophantine equations is of current interest to us. So, with that being said, our goal is to use our broad methodology to prove the following:

Theorem(Legendre): Let $a,b,c\in\mathbb{Z}$ be non-zero and pairwise coprime. Let $f_{a,b,c}(x,y,z)\in\mathbb{Z}[x,y,z]$ be defined by $f_{a,b,c}(x,y,z)=ax^2+by^2+cz^2$ . Then, the Diophantine equation

$f_{a,b,c}(x,y,z)=0$

has a non-zero solution if and only if:

Not all of $a,b,$ and $c$ are the same sign.

The number $-ab$ is a square modulo $|c|$ , the number $-ac$ is a square modulo $|b|$ , and the number $-bc$ is a square modulo $|a|$ .

Moreover, given one non-zero solution there is a natural way to parameterize all the solutions.

Note that this theorem is actually incredibly useful, because the theory of Quadratic Reciprocity gives us an incredibly powerful, quick means of checking 2. above.

So, before we dive into how geometry and number theory unite to prove Legendre’s Theorem, let us remark why this is, in some sense, the ‘next simplest example’ of equations after the univariate case. Of course, this may seem strange—how is this the next obvious case after the univariate case?—but as we’ll see, our lens by which to view Diophantine equations makes this precise.

So, towards this end, let us note that the set $X_{a,b,c}(\mathbb{Q}):=X_{f_{a,b,c}}(\mathbb{Q})$ has a lot of redundancy. Namely, given any $(x,y,z)\in X_{a,b,c}(\mathbb{Q})$ and any $\lambda\in\mathbb{Q}^\times$ one obtains another solution $(\lambda x,\lambda y,\lambda z)$ just by the virtue of the fact that $f_{a,b,c}$ is homogenous. Note that this ‘line’ of solutions really shouldn’t be thought of as different solutions and the more fundamental object should be something like $X_{a,b,c}(\mathbb{Q})/\mathbb{Q}^\times$ . So, to this end, if $f(T_1,\ldots,T_n)$ is a homogenous polynomial then we define $P_f(R):=(X_f(R)-\{0\})/R^\times$ . Note then that one immediately has $P_f(\mathbb{Z})=P_f(\mathbb{Q})$ .

Remark: For those that know what this means, note that this is technically the wrong definition of $P_f$ for general $R$ . Namely, we’d like to imagine that $P_f$ is just the projective scheme $\text{Proj}(\mathbb{Z}[T_1,\ldots,T_n]/(f(T_1,\ldots,T_n)))$ but the $R$ -points of this are not, in general, just $(X_f(R)-\{0\})/R^\times$ . The issue is that this projective scheme is essentially the moduli space for the quotient $(X_f-\{0\})/\mathbf{G}_m$ and, of course, in general we don’t expect quotient sheaves to have $R$ -points the quotient of the individual $R$ -points. The issue, as one can quickly deduce, lies in $H^1_{\text{fppf}}(\text{Spec}(R),\mathbf{G}_m)=\text{Pic}(R)$ . But, this is not really going to be of interest to us since we’ll be dealing primarily with local rings $R$ .

Note that, just as in the case of $X_f$ one has that $P_f(\overline{\mathbb{Q}})$ has a continuous $G_{\mathbb{Q}}$ -action and $P_f(\mathbb{C})$ is a topological space (namely one takes the quotient space $(X_f(\mathbb{C})-\{0\})/\mathbb{C}^\times$ ).

So, with this in mind, we can explain why the polynomials $f_{a,b,c}$ are the next obvious choices after univariate polynomials. Namely, note that if $f$ is univariate then $X_f(\mathbb{C})$ is a discrete topological space or, equivalently, a $0$ -dimensional compact complex manifold. There is an obvious parameter to tick up here: the dimension. So, the next obvious polynomials to look at are those such that $X_f(\mathbb{C})$ is a $1$ -dimensional compact complex manifold. But, even amongst $1$ -dimensional compact complex manifolds there is a simplest. Namely, by the classification of compact orientable surfaces (of which every complex manifold is) there is a parameter given by the number of holes, the genus $g$ . The simplest one is then the surface of genus $0$ : the Riemann sphere $\mathbb{CP}^1$ .

Now, it seems obvious that the next obvious class of polynomials we should study are those such that $X_f(\mathbb{C})$ is the Riemann sphere. Unfortunately, no such $f$ exist. That said, there are homogenous polynomials $f$ such that $P_f(\mathbb{C})$ is $\mathbb{CP}^1$ . In fact, one can show that $P_f(\mathbb{C})$ is $\mathbb{CP}^1$ means that (up to isomorphism) our polynomial $f$ is one of the $f_{a,b,c}$ as in the preamble to Legendre’s theorem. Thus, Legendre’s theorem really is the next logical step in our study of Diophantine equations.

Remark: Two remarks are in order for those that know what this means. First, the above statement really concerns not $P_f$ as a finite-type $\mathbb{Z}$ -scheme but its generic fiber over $\mathbb{Q}$ . Moreover, to make the above claim correct, we can’t have that $P_f(\mathbb{C})$ is just topologically the Riemann surface, we also need that it’s the Riemann surface as a complex analytic space: that $P_f^\text{an}$ is $\mathbb{CP}^1$ . If we assume that $P_f$ is smooth then homeomorphism is enough, but the cuspidal cubic is not one of the $f_{a,b,c}$ and has analytification homeomorphic to $\mathbb{CP}^1$ .

The proof of the above claim is pretty simple. Namely, any such $f$ must have that $P_f$ (defined as its associated projective scheme over $\mathbb{Q}$ ) it a smooth geometrically integral genus $0$ curve. One can then show that for any such curve $X$ one obtains an embedding $X\hookrightarrow\mathbb{P}^2_\mathbb{Q}$ by considering $\omega_X^{-1}$ (the inverse of the canonical bundle) which follows easily from Riemann-Roch. One then realizes $X$ is a hypersuface in $\mathbb{P}^2_{\mathbb{Q}}$ which, by genus considerations, must be degree $2$ . The explicit form then follows from the elementary diagonalization of quadratic forms.

The fact that $X_f(\mathbb{C})$ can never be $\mathbb{CP}^1$ follows from the standard fact about analytifications: that the analytification is compact if and only if the scheme is proper. By design $X_f$ is affine, so if it is also proper, then it’s finite. But, then $X_f(\mathbb{C})$ is a finite set of points.

The geometric part

So, now that we see why the Diophantine equations addressed by Legendre’s theorem are the next logical step after univariate polynomials, we can begin to try to understand how one might prove the theorem itself. Let us first address the geometric side of the picture—the claim that one can use a single non-trivial solution allows one to parameterize all the solutions of the Diophantine equation. Indeed, to show this we will exploit the geometry of $P_f(\mathbb{C})$ at least morally—we’ll be exploiting the geometry of the scheme which is, essentially, the same thing.

So, to make this claim precise, let us assume $f$ is one of the equations from Legendre’s theorem and that $x_0\in P_f(\mathbb{Q})$ is given. We then define a bijection

$P_f(\mathbb{Q})\to \mathbb{P}^1(\mathbb{Q})=\mathbb{Q}\cup\{\infty\}$

as follows. We can imagine $P_f(\mathbb{Q})$ as being a subset of $(\mathbb{Q}^3-\{0\})/\mathbb{Q}^\times=\mathbb{P}^2(\mathbb{Q})$ . In particular, it looks like a subset cut out by a degree $2$ equation. For any point $x\in P_f(\mathbb{Q})$ there is a unique line $L_x$ in $\mathbb{P}^2(\mathbb{Q})$ passing through $x$ and $x_0$ . Map $x$ to the point in $\mathbb{P}^1(\mathbb{Q})$ which is the slope of the line $L_x$ . Of course, we think of $L_{x_0}$ as having slope $\infty$ and so $x_0$ maps to $\infty$ . One can easily check that this map is surjective, and a basic verison of Bezout’s Theorem (which can be proven topologically/geometrically) says that if $x\ne y$ then $L_x\ne L_y$ —indeed, if $L_x=L_y$ then $L_x$ would intersect $P_f(\mathbb{Q})$ at at least three places (namely $x,y,x_0$ ) and this is impossible since the total number of intersection points of a subset cut out by a degree $2$ equation and a line is $2$ . This gives the desired bijection.

Remark: This is much nicer if one can draw a picture illustrating this point.

For those that know what this means, this is just the isomorphism $P_f\to\mathbb{P}^1_{\mathbb{Q}}$ given by the line bundle $\mathcal{O}(x_0)$ .

As an example of this, one can check that for $f_{1,1,-1}$ and point $x_0=(0,0,1)$ one gets that the bijection from above shows that the rational points of $P_f$ (besides $x_0$ ) are essentially of the form $\displaystyle \left(\frac{p^2-q^2}{p^2+q^2},\frac{-2pq}{p^2+q^2},1\right)$ where $\displaystyle \frac{p}{q}$ denotes the slope in reduced form. The integer solution in $X_{1,1,-1}(\mathbb{Q})$ in the class of this point is $(p^2-q^2,-2pq,p^2+q^2)$ . This should look familiar and, indeed, $X_{1,1,-1}(\mathbb{Z})$ is just the set of Pythagorean triples, and this geometric procedure produces the usual parameterization of such triples.

The arithmetic part

Now that we have taken care of the geometric part of Legendre’s theorem, we need to explain the arithmetic part, the condition for which $f_{a,b,c}$ has a non-zero solution or, equivalently, for $P_{a,b,c}(\mathbb{Q}):=P_{f_{a,b,c}}(\mathbb{Q})\ne\varnothing$ .

The motivation

The beginning is, in fact, also geometrically minded. For the next two minutes I will assume that people in the audience have basic familiarity with algebraic geometry. For those that do not, don’t fret, the conclusion will not have a lick of algebraic geometry in it—we’re just talking motivation.

So, it’s clear that from $P_{a,b,c}$ (thought about as the projective scheme $\text{Proj}(\mathbb{Z}[x,y,z]/(f_{a,b,c}))$ ) we get a map $f:P_{a,b,c}\to\text{Spec}(\mathbb{Z})$ . We can then think of $P_{a,b,c}(\mathbb{Q})=P_{a,b,c}(\mathbb{Z})$ as sections $s$ of this map $f$ . Now, if one is trying to build a section of a map $g:X\to Y$ of topological space, a common technique might be to try and build, for each $y\in Y$ , a section $s_y$ of $g^{-1}(U_y)\to U_y$ where $U_y$ is a ‘very small’ neighborhood around $y$ . One then tries to glue these sections $s_y$ together to get a global section $s$ . So, if we try to analogize this property for $P_{a,b,c}\to\text{Spec}(\mathbb{Z})$ we want to, for each prime $p$ , build a section from a ‘very small’ neighborhood of $p$ in $\text{Spec}(\mathbb{Z})$ .

The question then comes what a ‘very small’ neighborhoood of $p$ in $\text{Spec}(\mathbb{Z})$ might look like. The Zariski neighborhoods are too coarse, and so one needs to opt for something else. Namely, we think of a sufficiently small neighborhood of a point $y$ as being a neighborhood so small that properties of something at $y$ should extend to properties of that entire neighborhood. In other words, we want a neighborhood of $p$ such that the only obstruction to something in that neighborhood is an obstruction at $p$ . Since we’re thinking about polynomial equations, we think of an obstruction as being the lack of a solution to a polynomial equation $P(T_1,\ldots,T_n)\in\mathbb{Z}[T_1,\ldots,T_n]$ . So, what does it mean for an equation to be unobstructed ‘at $p$ ‘?

Well, the naive guess is that it means the polynomial $P$ doesn’t have a solution modulo $p$ . But, this isn’t good enough. Namely, the prime $p$ gives more chance for obstruction than just modulo $p$ . What about modulo $p^2$ ? What about modulo $p^3$ ? It then seems reasonable to guess that unobstructed ‘at $p$ ‘ means that the polynomial has a solution modulo $\mathbb{Z}/p^n\mathbb{Z}$ for all $p$ .

So, this ‘very small’ neighborhood should have the property that being unobstructed at $p$ is the same thing as being unobstructed in this neighborhood. Since we’re dealing with algebraic geometry, the neighborhood should be the spectrum of some ring $R$ , and reinterpreting this last sentence means that a polynomial equation $P$ should have a solution in $R$ if and only if it has a solution in $\mathbb{Z}/p^n\mathbb{Z}$ for all $n$ . This is exactly describing the ring $\mathbb{Z}_p$ of $p$ -adic integers.

So, a ‘very small’ neighborhood of $p$ in $\text{Spec}(\mathbb{Z})$ is $\text{Spec}(\mathbb{Z}_p)$ . Thus, a section of $P_{a,b,c}\to\text{Spec}(\mathbb{Z})$ ‘very close’ to $p$ should mean a section of $(P_{a,b,c})_{\text{Spec}(\mathbb{Z}_p)}\to\text{Spec}(\mathbb{Z}_p)$ or, in other words, an element of $P_{a,b,c}(\mathbb{Z}_p)=P_{a,b,c}(\mathbb{Q}_p)$ .

Thus, the rephrasing of the geometric question of whether we can glue local sections to global sections is whether or not we can glue the elements of $P(\mathbb{Q}_p)$ together for all $p$ to obtain a point of $P(\mathbb{Q})$ . In fact, one should also include $P(\mathbb{R})$ in this setup. Indeed, given the function field $K(X)$ of a curve $X$ , the points of $X$ correspond to the valuations on $K(X)$ and the valuations of $\mathbb{Q}$ are $v_p$ , which corresponds to $\mathbb{Q}_p$ , for all $p$ and $v_\infty$ which corresponds to $\mathbb{R}$ .

So, concretely, the question becomes whether or not $P_{a,b,c}(\mathbb{Q}_v)\ne\varnothing$ for all valuations $v$ (define $\mathbb{Q}_\infty=\mathbb{R}$ ) implies that $P_{a,b,c}(\mathbb{Q})$ is non-empty. If $P_f$ satisfies this for a homogenous $f$ we say that $f$ satisfies the local-to-global principle (or Hasse principle). Of course, there is no reason, a priori, to even expect this. Indeed, even for the sections of a map, one at least expect to impose some compatibility conditions on the overlaps of these local sections and the local-to-global principle doesn’t require the analogue of such compatibilities (what would that even mean?).

That said, let us see that the claim that the polynomials $f_{a,b,c}$ satisfy the local-to-global principle completely solves the first part (the criterion for non-zero solutions) of Legendre’s theorem. Indeed, we claim that the conditions stated are precisely the same as requiring $P_{a,b,c}(\mathbb{Q}_v)\ne\varnothing$ for all $v$ . For example, 1. is easily seen to be equivalent to the claim that $P_{a,b,c}(\mathbb{R})\ne\varnothing$ .

Suppose now that $p$ is a prime that does not divide any of $a,b$ or $c$ . We then claim that in this case $P_{a,b,c}(\mathbb{Q}_p)\ne\varnothing$ without any effort whatsoever. Indeed, note that $P_{a,b,c}(\mathbb{Q}_p)=P_{a,b,c}(\mathbb{Z}_p)$ and the multi-variable version of Hensel’s lemma (that does exist!) tells us that the reduction map $P_{a,b,c}(\mathbb{Z}_p)\to P_{a,b,c}(\mathbb{F}_p)$ is a surjection. Thus, it suffices to show that $P_{a,b,c}(\mathbb{F}_p)\ne\varnothing$ . There are several ways to do this, but one very clear one is the following. Since $a,b,c$ are non-zero modulo $p$ we may assume (by dividing through) that $c=1$ . We are then trying to find the number of non-zero solutions to $ax^2+by^2+z^2=0$ in $\mathbb{F}_p$ and, in particular, show it’s non-zero. But, it suffices to show that there’s a solution to $ax^2+by^2+1=0$ . But, this is equivalent to showing that the elements of $\mathbb{F}_p$ of the form $ax^2$ and the elements of the form $-1-by^2$ have to have an element in common. But, note that since $a,b$ are non-zero each has precisely $\displaystyle \frac{p-1}{2}+1$ elements expressible in that form, and so if they had none in common, there’d be $\displaystyle 2\left(\frac{p-1}{2}+1\right)=p+1$ elements of $\mathbb{F}_p$ , which is ridiculous. Thus, $ax^2+by^2+cz^2=0$ has a non-zero solution in $\mathbb{F}_p$ and thus $P_{a,b,c}(\mathbb{Q}_p)\ne\varnothing$ as desired.

Finally, let $p$ be a prime that divides one of $a,b,c$ . Note that by our assumption on the greatest common divisors of the $a,b,c$ (which is really no condition by dividing out any common divisors—if two of the coefficients is divisible by $p$ , so must be the third) we know that $p$ will divide precisely one of the $a,b,c$ . Let’s assume, without loss of generality that $p\mid c$ . Now, note that we want to begin the argument as in the previous case by using Hensel’s lemma. But, unfortunately, $P_{a,b,c}$ is not smooth over $\mathbb{Z}_p$ . So, to make things work, we instead work with $U\subseteq P_{a,b,c}$ defined as the those tuples in $P_{a,b,c}$ with non-vanishing $z$ -coordinate (or, in algebraic geometry language, $D_+(z)\cap P_{a,b,c}$ where the intersection is happening in $\mathbb{P}^2_{\mathbb{Q}}$ ). Note then that $U$ is, in fact, smooth over $\mathbb{Z}_p$ (over $\mathbb{F}_p$ the non-smooth points of $P_{a,b,c}$ occur on the complement of $D(z)$ !). Thus, by Hensel’s lemma it suffices to show that $U(\mathbb{F}_p)$ is non-empty. To see this we need to show that there is a solution to $ax^2+by^2=0$ with $z\ne 0$ . But, note that this is possible since $-ab$ is a square modulo $p$ so that $t^2=-ab$ and thus, by the multiplicativity of the Legendre symbol, $\displaystyle z^2=\frac{-b}{a}$ is solvable. So, if $t_0$ is such a solution then evidently $x=t_0$ , $y=1$ , and $z=1$ work. Thus, $U(\mathbb{Z}_p)\subseteq P_{a,b,c}(\mathbb{Z}_p)=P_{a,b,c}(\mathbb{Q}_p)$ is non-empty as desired.

The local-to-global principle

So, now that we’ve reduced Legendre’s theorem to the local-to-global principle for polynomials like $f_{a,b,c}$ we finish by applying some interesting number theoretic results to deduce the desired property. Namely, we will introduce an interesting object that will come later on in this course, and explain how its properties (that you’ll discuss) actually prove the local-to-global principle for the polynomials $f_{a,b,c}$ .

So, to this end, we take an ostensibly ninety-degree turn. Namely, for any field $K$ let us define a central simple algebra over $K$ to be a (possibly non-commutative) $K$ -algebra $A$ with the property that $A\otimes_K \overline{K}$ is isomorphic to $\text{Mat}_n(\overline{K})$ as an $\overline{K}$ -algebra for some $n$ . Let us say that two central simple algebras $A$ and $B$ are equivalent (written $A\sim B$ ) if there exists integers $m$ and $n$ Such that $\text{Mat}_n(A)\cong\text{Mat}_m(B)$ as $K$ -algebras. Finally, let the Brauer group of $K$ , denoted $\text{Br}(K)$ , the group of central simple algebras over $K$ up to equivalence, with group operation being tensor product. The identity of $\text{Br}(K)$ is the equivalence class of $K$ . Finally, we say that a central simple algebra is split if it’s equivalent to $K$ .

Remark: If you know what this means, $\text{Br}(K)=H^2(G_K,\overline{K}^\times)$ .

Now that if one has a map of fields $L\to K$ one obtains a map of Brauer groups $\text{Br}(L)\to\text{Br}(K)$ given by $A\mapsto A_K:=A\otimes_L K$ . Indeed, it’s fairly easy to see that $A_K$ is a central simple $K$ -algebra since

$A_K\otimes_K \overline{K}=(A\otimes_L \overline{L})\otimes_K \overline{K}=\text{Mat}_n(\overline{L})\otimes_K \overline{K}=\text{Mat}_n(\overline{K})$

and one can check that this map is, in fact, well-defined (i.e. if $A\sim B$ then $A_K\sim B_K$ ).

Our goal in this section is to explain how to use the Brauer groups, and their properties for local/global fields, to prove the local-to-global principle for the $f_{a,b,c}$ ‘s. This is, in some sense, a gross over-complication of the matter. But, our reasons for doing this are three-fold:

The whole goal of this post was to explain how ‘serious’ modern number theory that will show up in this course can be used to answer concrete questions—the below is a perfect example of this.
The proof of the below (while dramatic overkill) gives a much more conceptual understanding of why the local-to-global principle holds for the $f_{a,b,c}$ . There is a way to prove the local-to-global principle for $f_{a,b,c}$ ‘s using ‘elementary techniques’ (see here for instance), but they don’t really explain how the result fits into the larger context of abelian reciprocity laws/class field theory.Specifically, they don’t really explain in any essentially complete way why these Diophantine equations (and not many others) satisfy local-to-global principle. The stated result below should be thought of as the ‘true’ local-to-global statement, and the local-to-global property for our $f_{a,b,c}$ ‘s comes from a coincidental (or deep?) connection to central simple algebras.
The below method actually is (essentially) equivalent to a local-to-global result for Diophantine equations of a much more general form. Namely, let $\{f_1,\ldots,f_m\}$ be homogenous polynomials in the same number of variables, and consider the Diophantine equation $P:=P_{f_1}\cap\cdots P_{f_m}$ —the simultaneous Diophantine equations $f_1=\cdots=f_m=0$ (for people that know what this means one should take $P=\text{Proj}(\mathbb{Q}[T_1,\ldots,T_n]/(f_1,\ldots,f_m)$ ). Let us call this set of Diophantine equations Brauer-Severi if there is a (complex analytic) isomorphism $P(\mathbb{C})\to \mathbb{P}^n(\mathbb{C})$ (where $\mathbb{P}^n(\mathbb{C})=(\mathbb{C}^{n+1}-\{0\})/\mathbb{C}^\times$ ) which restricts an isomorphism $P(\overline{\mathbb{Q}})\to\mathbb{P}^n(\overline{\mathbb{Q}})$ (for those that know what this means, we just mean that $P_{\overline{\mathbb{Q}}}\cong \mathbb{P}^n_{\overline{\mathbb{Q}}}$ ). Then, the below results can be (mostly) summarized as the claim that Brauer-Severi Diophantine equations satisfy the local-to-global principle. This is the correct light in which to view the result for the $f_{a,b,c}$ ‘s and really explains why their geometry is pivotal.

Before we go any further it’s worth discussing an explicit example of such objects. Namely, the Hamiltonian Quaternions $\mathbb{H}$ are an example of a central simple algebra over $\mathbb{R}$ . Indeed, one can check that $\mathbb{H}\otimes_\mathbb{R}\mathbb{C}\cong\text{Mat}_2(\mathbb{C})$ . In fact, one can show that, up to equivalence, the only central simple algebras over $\mathbb{R}$ are $\mathbb{R}$ and $\mathbb{H}$ so that $\text{Br}(\mathbb{R})\cong\mathbb{Z}/2\mathbb{Z}$ .

So, why does this have anything to do with local-to-global principle for the $f_{a,b,c}$ ‘s? To begin, let us note that if we’re willing to take $a,b\in\mathbb{Q}$ we can basically only consider the polynomial equations $f_{\alpha,\beta}:=\alpha x^2+\beta y^2+z^2$ . Note then that $P_{a,b,c}(\mathbb{Q})=P_{f_{\alpha,\beta}}(\mathbb{Q})$ with $\displaystyle \alpha=\frac{a}{c}$ and $\displaystyle \beta=\frac{b}{c}$ . So, it suffices to explain how the polynomials $f_{\alpha,\beta}$ relate to the Brauer group. So, to this end, let us define another concrete example of a central simple algebra over $\mathbb{Q}$ which, in a clear way, is a variation on a theme of the Hamiltonian Quaternions. Namely, for $\alpha,\beta\in\mathbb{Q}^\times$ let $Q(\alpha,\beta)$ be the following central simple algebra over $\mathbb{Q}$ :

$Q(\alpha,\beta)=\mathbb{Q}\oplus\mathbb{Q}i\oplus\mathbb{Q}j\oplus\mathbb{Q}k$

with $i^2=\alpha$ , $j^2=\beta$ , $ij=k$ , and $ij=-ji$ .

The theorem that then relates everything is the following:

Theorem: Let $\alpha,\beta\in\mathbb{Q}^\times$ . Then, for any field $K/\mathbb{Q}$ the central simple algebra $Q(\alpha,\beta)_K$ is split if and only if $P_{f_{\alpha,\beta}}(K)\ne\varnothing$ .

Proof(sketch): We first claim that $Q(\alpha,\beta)_K$ is non-split if and only if it’s a division algebra. This follows at once from the well-known Artin-Wedderburn theorem. Indeed, since $Q(\alpha,\beta)_K$ is simple with center $K$ we know that, as a $K$ -algebra, it’s isomorphic to $\text{Mat}_n(D)$ with $D/K$ is a central division algebra. But, by dimension considerations we see that either $Q(\alpha,\beta)_K$ is either $D$ or $D=k$ .

So, whether or not $Q(\alpha,\beta)_K$ is split is equivalent to knowing whether it’s a division algebra. But, note that there is a norm map $N:Q(\alpha,\beta)_K\to K$ given by

$N(x+yi+zj+wk)=x^2-\alpha y^2-\beta z^2+\alpha\beta w^2$

One can check that $N(q)=q\overline{q}$ where $\overline{q}=x-yi-zj-wk)$ if $q=x+yi+zj+wk$ . Thus, $N$ is multiplicative, and it’s pretty easy to check that $q\in Q(\alpha,\beta)_K^\times$ if and only if $N(q)=0$ .

Thus, the splitness of $Q(\alpha,\beta)_K$ is equivalent to the existence of a non-zero non-unit of $Q(\alpha,\beta)_K$ which is, by the previous paragraph, equivalent to the existence of $x,y,z,w\in K$ (not all zero) such that $x^2-\alpha y^2-\beta z^2+\alpha\beta w^2=0$ . This looks like moderately close to the existence of a point in $P_{f_{\alpha,\beta}}(K)$ and, as it turns out (with a bit of algebra grease), that it’s true. You can look in the wonderful text Central Simple Algebras and Galois Cohomology (chapter 1) by Szamuely for the details. $\blacksquare$

Remark: The more conceptual reason for the above can be explained as follows. By a previous remark one can identify $\text{Br}(K)$ with $H^2(G_K,\overline{K}^\times)$ . The short exact sequence of $G_K$ -groups

$1\to\overline{K}^\times\to \text{GL}_n(\overline{K})\to \text{PGL}_n(\overline{K})\to 1$

gives, by the connecting homomorphism and Hilbert Theorem 90, an injection of the form $H^1(G_K,\text{PGL}_n(\overline{K}))\to H^2(G_K,\overline{K}^\times)$ . But, note that $\text{PGL}_n(\overline{K})=\text{Aut}(\mathbb{P}^n_{\overline{K}})$ and thus, by the theory of twists, $H^1(G_K,\text{PGL}_n(\overline{K})$ classifies varieties over $K$ which become isomorphic to $\mathbb{P}^n_{\overline{K}}$ over $\overline{K}$ —the Brauer-Severi varieties.

In particular, such a variety $V$ gives a class $[V]\in H^1(G_K,\text{PGL}_n(\overline{K}))$ and thus, by the above, a class $[V]\in\text{Br}(K)$ . Moreover, one can show that $V\cong \mathbb{P}^n_K$ (i.e. isomorphic over $K$ not $\overline{K}$ ) if and only if $V(K)\ne\varnothing$ . Thus, we see that $V(K)\ne\varnothing$ if and only if $[V]\in\text{Br}(K)$ is trivial.

One can check, as was already done in the section labeled ‘The geometric part’, that $P_{a,b,c}$ becomes isomorphic to $\mathbb{P}^1_{\mathbb{Q}}$ over $\overline{\mathbb{Q}}$ and thus, putting this all together, we see that for every $P_{a,b,c}$ and any field $K/\mathbb{Q}$ one can associate some class $[P_{a,b,c}]$ in $H^1(G_K,\text{PGL}(\overline{K}))\subseteq\text{Br}(K)$ such that $P_{a,b,c}(K)\ne\varnothing$ if and only if $[P_{a,b,c}]\in\text{Br}(K)$ is split (i.e. trivial). As one might guess, $[P_{a,b,c}]$ is the central simple algebra $Q(\alpha,\beta)_K$ with $\displaystyle \alpha=\frac{a}{c}$ and $\displaystyle \beta=\frac{b}{c}$ .

So, from this we see that the local-to-global principle becomes equivalent to the claim that $Q(\alpha,\beta)$ is split if and only if $Q(\alpha,\beta)\otimes_\mathbb{Q}\mathbb{Q}_v$ is split for all valuations $v$ of $\mathbb{Q}$ . So, how does one go about doing this?

The key is the following result that will be proven in this class:

Theorem(Fundamental sequence): There are isomorphisms $\text{inv}_v:\text{Br}(\mathbb{Q}_p)\cong\mathbb{Q}/\mathbb{Z}$ and $\text{inv}_v:\text{Br}(\mathbb{R})\cong \frac{1}{2}\mathbb{Z}/\mathbb{Z}\subseteq\mathbb{Q}/\mathbb{Z}$ such that the following sequence is exact:

$\displaystyle 0\to\text{Br}(\mathbb{Q})\xrightarrow{L}\bigoplus_v \text{Br}(\mathbb{Q}_v)\xrightarrow{\text{inv}}\mathbb{Q}/\mathbb{Z}\to 0$

where $L(A)=(A\otimes_\mathbb{Q}\mathbb{Q}_v)$ and $\text{inv}(A_v)$ is $\displaystyle \sum \text{inv}_v(A_v)$ .

This is an incredibly deep theorem. It contains within in it quite a bit of the wreckingball that is Class Field Theory. As an example, the fact that $\text{inv}\circ L$ is trivial contains within it all the reciprocity laws (quadratic, cubic, Eisenstein) from elementary algebraic number theory. The proof comes from a detailed proof of the cohomology of the ideles of finite extensions of $\mathbb{Q}$ .

In particular, we see that this deep number theoretic result contains within it also the local-to-global principle for the polynomials $f_{a,b,c}$ . Indeed, taking $\displaystyle \alpha=\frac{a}{c}$ and $\displaystyle \beta=\frac{b}{c}$ we have already observed that the local-to-global principle for $f_{a,b,c}$ is equivalent to the fact that $Q(\alpha,\beta)$ is split if and only if $Q(\alpha,\beta)\otimes_\mathbb{Q}\mathbb{Q}_v$ is split for all $v$ . Using the terminology from the above theorem this is equivalent to the fact that $Q(\alpha,\beta)$ is split if and only if $L(Q(\alpha,\beta))$ is split. But, by the definition of the Brauer group this follows from the injectivity of $L$ .

In fact, we actually get something stronger from the above theorem. Namely, one can quite easily show that for each $\alpha,\beta$ that the element $Q(\alpha,\beta)$ is $2$ -torsion (prove this for yourself!). So, in particular, what we see is that if $L(Q(\alpha,\beta))$ is non-zero, then it must be tuple of the form $(a_v)$ with each $a_v$ equal to $1$ (in which case it’s split) or $\frac{1}{2}$ (in which case it’s not). But, noting that $\text{inv}(L(Q(\alpha,\beta))=0$ we actually conclude that the number of places $v$ with $a_v=\frac{1}{2}$ is actually even! In particular, if we know that $Q(\alpha,\beta)$ is split over $K_v$ , in other other words $P_{a,b,c}(\mathbb{Q}_v)\ne\varnothing$ , for all but one $v$ then, in fact, $Q(\alpha,\beta)$ is split over every $K_v$ and thus, in particular, $P_{a,b,c}(\mathbb{Q})\ne\varnothing$ . So, we have actually proven something stronger than Legendre’s therem.

Conclusion

We see that Legendre’s theorem, perhaps the most basic of all non-trivial Diophantine equations, requires (in its ‘correct’ proof) a true modern perspective: the combination of number theory and algebraic geometry—it requires arithmetic geometry. Quite amazing!

What comes next?

So, now that we’ve handled the Diophantine equations as in Legendre’s theorem, what type of equations come next? Using our previous discussion as a clue, we might try to look for polynomials $f$ such that $P_f(\mathbb{C})$ are the next most complicated $1$ -dimensional compact complex manifolds: compact Riemann surfaces of genus $1$ .

These are, essentially, the so-called elliptic curves over $\mathbb{Q}$ . The study of their rational points, the solutions of the Diophantine equation, is a widely studied topic which comprises quite a sizable portion of modern research. For example, one can show that if $E:=P_f$ is such an elliptic curve that $E(\mathbb{Q})=E(\mathbb{Z})$ is a finitely generated abelian group. We can then write $E(\mathbb{Q})=\mathbb{Z}^r\oplus T$ where $T$ is a finite abelian group. Deep work of Mazur has identified the possible choices for $T$ : $\mathbb{Z}/N\mathbb{Z}$ for $N=1,\ldots,10,12$ and $(\mathbb{Z}/2\mathbb{Z})\times(\mathbb{Z}/2N\mathbb{Z})$ for $N=1,2,3,4$ . What the possibilites of $r$ can be is an incredibly intense area of modern research. It’s even hotly debated whether or not the possible $r$ are finite. Recent work of Manjul Bhargava shows that, probabilistically, half of the time $r=0$ and half of the time $r=1$ (this was one of the main reasons he received the Fields Medal). Moreover, there are deep conjectures (most notably the Birch and Swinnerton-Dyer conjecture) that relates this $r$ to important analytic objects (their $L$ -functions) which, due to Wiles, is the same thing as analytic objects coming from Harmonic analysis (automorphic $L$ -functions).

The next step might be to study $f$ with $P_f(\mathbb{C})$ a curve of genus $g>2$ . The general theory of such $f$ is fairly sparse. But, thanks to Gerd Faltings we know an incredibly powerful qualitative result about the solutions to such Diophantine equations. Namely, Faltings shows that $P_f(\mathbb{Q})$ is always finite. An incredibly stunning results considering the incredible variety of examples it encompasses.

Remark: Not that most of this note has been rigorous, but it should be noted that the above paragraph was a discussion mostly in analogies. There are no $f$ with $P_f(\mathbb{C})$ having genus $2$ for example—genus $2$ curves are not hypersurfaces. One should, if you know what this means, replace $P_f$ by a smooth geomerically integral proper curve of genus $g$ in the above paragraph.

It should also be noted that we are largely, in the above, replacing the finite-type $\mathbb{Z}$ -scheme with its generic fiber.

After one is done with curves, the next step is to move onto Diophantine equations whose $\mathbb{C}$ -points are higher-dimensional compact complex manifolds. Specifically, next up would be the study of proper surfaces (those Diophantine equations with associated complex points a $2$ -dimensional compact complex manifold). These are broken down into cases, similar to the partition of curves into their genus classes, by their minimal models. Namely, there is a classification of minimal surfaces (over $\overline{\mathbb{Q}}$ , and this allows one to study the rational points of surfaces over $\mathbb{Q}$ by what their associated minimal surface over $\overline{\mathbb{Q}}$ is. See this nice reference for a leisurely discussion of the topic.

Moving on from this point things get even more hard. Namely, for dimensions larger than $2$ there is no real geometric classification of the objects involved and so study by breaking them into similar classes seems impossible. Thus, there is currently no real general theory for Diophantine equations whose $\mathbb{C}$ -points are compact complex manifolds of dimension greater than $2$ , only theory for such Diophantine equations of certain special forms (e.g. abelian varieties).

Relation to the ‘modern perspective’

I can’t resist, as a final note, explaining how this perspective on Diophantine equations allows us to more directly unite Diophantine equations with the modern perspective that number theory is the study of the group $G_{\mathbb{Q}}$ and its representations.

Namely, we saw that in the case of univariate polynomials that one could very easily associate a representation of $G_{\mathbb{Q}}$ that, in some sense, ‘tells all’. One of the reasons we then sought the refuge of the geometry underpinning ‘higher-dimensional Diophantine equations’ was that such a technique no longer proved to be possible. One may then wonder if one can actually exploit this geometry to remedy this situation—if we can use the geometry somehow to associate to Diophantine equations (of arbitrary dimensions) representations of $G_{\mathbb{Q}}$ which, like in the univarate case, give us insights into the solutions to the Diophantine equations.

So, the main impediment to extending what happened in the univariate case to higher dimensions was the lack of a natural finite-dimensional vector space attached to the Diophantine equation for $G_{\mathbb{Q}}$ to act on. Namely, we ostensibly made $\rho_f$ , for $f$ univariate, by creating a vector space in the dumbest possible way: using the tautological representation of $S_N$ . How might we analogize this to higher dimensions?

The key observation is that, in fact, one can think of the vector space $\mathbb{C}^N$ showing up in the representation $\rho_f$ as actually coming from the geometry of $X_f$ . Namely, we saw that $X_f(\overline{\mathbb{Q}})=X_f(\mathbb{C})$ is nothing more than a set of discrete points with a continuous action of $G_{\mathbb{Q}}$ . Moreover, we can see that the representation $\rho_f$ essentially came from how $G_{\mathbb{Q}}$ permuted the connected components of this discrete space $X_f(\mathbb{C})$ and, in particular, $\rho_f$ just acted on the free $\mathbb{C}$ vector space on the connected components. In general, given a complex manifold $M$ , there is a name for the free $\mathbb{C}$ vector spaces on the set of connected components of $M$ : the zeroth singular cohomology $H^0_\text{sing}(M,\mathbb{C})$ . Indeed, one can note that since $X_f(\mathbb{C})$ is discrete that $G_{\mathbb{Q}}$ acts continuously on $X_f(\mathbb{C})$ and thus, by the functoriality of singular cohomology, gives an induced linear action on $H^0_\text{sing}(X_f(\mathbb{C}),\mathbb{C})$ . One can check that, as you’d hope, the $G_{\mathbb{Q}}$ -representations $\rho_f$ and $H^0_{\text{sing}}(X_f(\mathbb{C}),\mathbb{C})$ are isomorphic.

This gives us a clear-cut way to try and attack the goal of associating to higher-dimensional Diophantine equations representations of $G_{\mathbb{Q}}$ . For example, if $f$ is a homogenous polynomial, then $P_f(\mathbb{C})$ will be (in good situations) a compact complex manifold and thus, in particular, a well-behaved topological space. This allows us to associate to $f$ various complex vector spaces: the singular cohomology groups $H^i_\text{sing}(P_f(\mathbb{C}),\mathbb{C})$ for $i\geqslant 0$ . Of course, to make this be totally complete we will still need to define an action of $G_{\mathbb{Q}}$ on this space. Therein lies the rub. Namely, we have two difficulties that make this a much harder problem than in the discrete case. The first is that it is no longer true that $P_f(\overline{\mathbb{Q}})=P_f(\mathbb{C})$ (take $f(T_1,T_1)=T_1+T_2$ for example). So, what we get is not an action of $G_{\mathbb{Q}}$ on $P_f(\mathbb{C})$ but an action of $\text{Aut}(\mathbb{C}/\mathbb{Q})$ . Of course, this action factors through $G_{\mathbb{Q}}$ on the subset $P_f(\overline{\mathbb{Q}})\subseteq P_f(\mathbb{C})$ .

The second, and more serious issue, is that we want to consider these cohomology groups of $P_f(\mathbb{C})$ where we give $P_f(\mathbb{C})$ the complex topology, NOT the discrete one. That said, $\text{Aut}(\mathbb{C}/\mathbb{Q})$ only acts continuously on $P_f(\mathbb{C})$ with the discrete topology, and acts wildly discontinuously on $P_f(\mathbb{C})$ with the complex topology. In particular, since cohomology is only functorial for continuous maps, we seem completely doomed in utilizing the geometry to copy what happened in the univariate case for higher-dimensional examples.

In comes Grothendieck. It was Grothendieck’s brilliant genius to realize that one can fix the above, by understanding that the singular cohomology of $P_f(\mathbb{C})$ can, in some sense, be obtained by algebraic methods and, in particular, in a way that allows one to actually have $G_{\mathbb{Q}}$ act on this singular cohomology. This is the modern masterpiece that is the theory of etale cohomology.

Let me briefly explain what this looks like. Grothendieck realize that to any scheme $X$ (thought of as a ‘geometric space associated to a set of equations’—think $X_f$ or $P_f$ ) one can associate certain $\overline{\mathbb{Q}_\ell}$ vector spacess, denoted $H^i(X,\overline{\mathbb{Q}_\ell})$ . Moreover, what Grothendieck shows, and this is the pivotal part, is that these cohomology groups are functorial in the scheme. In particular, if $X/\mathbb{Q}$ is a scheme obtained from a Diophantine equation then one has an operation, known as base change, which gives a scheme $X_{\overline{\mathbb{Q}}}/\overline{\mathbb{Q}}$ (this is much like $X_f(\overline{\mathbb{Q}}$ ) and this has an action of $G_{\mathbb{Q}}$ (much in the same way that $X_f(\overline{\mathbb{Q}})$ had an action of $G_{\mathbb{Q}}$ ) and thus, by Grothendieck’s wonderful machine, we obtain an action of $G_\mathbb{Q}$ on $H^i(X_{\overline{\mathbb{Q}}},\overline{\mathbb{Q}_\ell})$ which is, in fact, continuous if one gives the cohomology group the natural topology of an $\overline{\mathbb{Q}_\ell}$ vector space.

What does this have anything to do with the geometry of $X(\mathbb{C})$ ? Well, Grothendieck and Artin showed that one has a natural isomorphism $H^i(X_{\overline{\mathbb{Q}}},\overline{\mathbb{Q}_\ell})\cong H^i_\text{sing}(X(\mathbb{C}),\overline{\mathbb{Q}_\ell})$ which, since $i:\overline{\mathbb{Q}_\ell}\cong \mathbb{C}$ , implies that (at least after choosing an isomorphism $i$ ) that $H^i(X_{\overline{\mathbb{Q}}},\overline{\mathbb{Q}_\ell})\cong H^i_\text{sing}(X(\mathbb{C}),\mathbb{C})$ . Thus, by Grothendieck’s wonderful machinery, and the beautiful result of Artin, one can define an action of $G_{\mathbb{Q}}$ on $H^i_\text{sing}(X(\mathbb{C}),\mathbb{C})$ , but even better, a continuous action of $G_{\mathbb{Q}}$ on $H^i_\text{sing}(X(\mathbb{C}),\overline{\mathbb{Q}_\ell})$ .

Thus, utilizing the geometry of the Diophantine equation and its arithmetic properties (i.e. viewing it through the lens of arithmetic geometry) one can associate to it a (continuous) representation of $G_{\mathbb{Q}}$ and, as promised, this enhances the study of the Diophantine equation. To elaborate on this would take many, many hours but suffice it to say that the representation contains within it much of the information of the Diophantine equation (e.g. the number of solutions of the equation modulo $p$ ). As an example, the fact that $P_{a,b,c}(\mathbb{F}_p)=p+1$ is a direct consequence (with the general theory at hand) of the fact that $H^1_\text{sing}(P_{a,b,c}(\mathbb{C}),\mathbb{C})=0$ .

Moral of the story: learn schemes

So, let’s summarize all of what happened above. If one wants to study a Diophantine equation, there are many methods of attack for specific types of equations, but no general approach. This is undesirable. In particular, one would like to develop a method to study such Diophantine equations in a systematic way that ties into the modern number theoretic goal of studying $G_{\mathbb{Q}}$ and its representations.

One achieves this systematic approach by trading an unstructured set of solutions (e.g. $X_f(\mathbb{Q})$ or $P_f(\mathbb{Q})$ ) for a $G_{\mathbb{Q}}$ -set and a geometric object (e.g. $P_f(\overline{\mathbb{Q}})$ and $P_f(\mathbb{C})$ ) and studying the Diophantine equation between the study of these two separately (number theory and algebraic geometry) and their interaction (arithmetic geometry).

We saw a key example of this in trying to understand quadratic Diophantine equations. We were able to, in Legendre’s theorem, give a fairly satisfactory description of their structure. But, to do so, required that we view such Diophantine equations both as a geometric object (namely $\mathbb{CP}^1$ ) and an object of number theory (a class in the Brauer group $\text{Br}(\mathbb{Q})$ ). Only by combining these approaches were we truly able to give our desired description.

Moreover, in the end, we indicated how this general arithemetico-geometric approach to Diophantine equations ties into the other modern perspective on number theory (the study of continuous representations of $G_{\mathbb{Q}}$ ) by explaining how the geometry allows us, thanks to the great work of Grothendieck (and many others, including Artin), associate to any Diophantine equation such a representation which, as hinted at, contains an immense amount of information about the Diophantine equation. Thus, the modern study of number theory (the study of Galois representations) does insert directly into the classical desire to study Diophantine equations.

Finally, it’s worth mentioning that all of the above is very neatly organized in modern algebraic geometry. Namely, it was the vision of Grothendieck and his collaborators to be able to neatly package the idea that from equations one can obtain a geometric and arithmetic object. Namely, they defined the notion of schemes which, in essence, captures both the arithmetic properties of the $G_{\mathbb{Q}}$ -set and the geometric properties of the topological (complex analytic) space associated to a Diophantine equation. So, as is usually a good way to end a talk, I implore you to go out and learn scheme theory. Not for its fanciness, but because of its natural necessity in the study of number theory and, in particular, in the study of Diophantine equations.

10 comments

Arun Debray says:

October 4, 2017 at 5:46 am

Regarding your remark “choose an isomorphism $\overline{\mathbb{Q}_\ell}\cong\mathbb{C}$:” I am out of my depth here, but I thought you had to complete $\overline{\mathbb{Q}_\ell$ before obtaining a field isomorphic to $\mathbb{C}$. Am I misremembering facts about the p-adics, or was this implicit in the notation, or something like that?

1. alexyoucis says:
  
  October 4, 2017 at 6:19 am
  
  Hey Arun,
  
  It’s actually true that all uncountable algebraically closed fields of characterstic $0$ are (abstractly) isomorphic. So, $\mathbb{C}\cong\overline{\mathbb{Q}_\ell}\cong\mathbb{C}_\ell$ (this later is the completion of $\overline{\mathbb{Q}_\ell}$ , as you indicated, that you need for a complete algebraically closed extension of $\mathbb{Q}_\ell$ ).
  
  Hope this helps!
  
  1. Arun Debray says:
    
    October 4, 2017 at 7:17 am
    
    Oh, interesting. That’s a neat fact. Thanks for the explanation!
  2. oregontrailmixtape says:
    
    November 6, 2017 at 10:16 am
    
    Hi Alex,
    
    I don’t mean to be a pesky pedant who annoyingly swoops in from the internet, but doesn’t one also require that the two algebraically closed fields of characteristic zero have the same cardinality? (This is certainly the case for $ \mathbb{C} $, $ \overline{\mathbb{Q}_\ell} $, and $ \mathbb{C}_\ell $, but it’s not true for, say, $ \overline{\mathbb{Q}(T_i)_{i\in\mathbb{R}}} $ and $ \overline{\mathbb{Q}(T_i)_{i\in 2^{\mathbb{R}}}} $).
    
    Best,
    
    oregontrailmixtape
  3. alexyoucis says:
    
    November 6, 2017 at 8:32 pm
    
    Of course–this was implied. I should have mentioned it of course! Thanks!
  4. oregontrailmixtape says:
    
    November 6, 2017 at 10:18 am
    
    (ah, and as usual, I utterly failed to use WordPress in LaTeX
lush says:

March 4, 2018 at 3:03 am

Sorry, one non-article related question: Have you got an RSS feed for this blog?
I can’t find any but I just want to make sure as I’d really like to subscribe to that!

1. alexyoucis says:
  
  October 30, 2019 at 3:19 pm
  
  I am not sure to be honest. Sorry!
  
2. Arun Debray says:
  
  October 30, 2019 at 4:11 pm
  
  https://ayoucis.wordpress.com/feed/
  
Jopito says:

July 18, 2018 at 11:15 am

I think it’s Hellegouarch and not Hellegourach

Hard Arithmetic

A fun (enough) talk

A warning

What is the point?

Why Diophantine equations?

What would a systematic study look like?

An extended example

The example

The geometric part

The arithmetic part

The motivation

The local-to-global principle

Conclusion

What comes next?

Relation to the ‘modern perspective’

Moral of the story: learn schemes

10 comments

Leave a reply to Arun Debray Cancel reply

A warning

What is the point?

Why Diophantine equations?

What would a systematic study look like?

An extended example

The example

The geometric part

The arithmetic part

The motivation

The local-to-global principle

Conclusion

What comes next?

Relation to the ‘modern perspective’

Moral of the story: learn schemes

Share this:

10 comments

Leave a reply to Arun Debray Cancel reply