A fun (enough) talk

This is a rough transcription of a talk I gave to a class of algebraic number theory students at UC Berkeley with the goal of trying to understand how one might bring to bear modern techniques in number theory/geometry on some classical questions. I have essentially kept the format the same, while adding a bit of extra material (and adding in their responses to questions I asked).

A warning

This talk was difficult to write for multiple reasons. Most relevant to the reader though was the unknown, and likely highly variable, backgrounds of the audience members. For this reason I chose to keep things as simple as possible and consequently, in the process, acquiesced quite a bit of rigor. I think that these mistakes are not so bad since it seems, to me, that any reader capable of identifying mistakes is likely able to see how to fix them. But, again, there are some informalities/inaccuraces.

What is the point?

So, everyone learning algebraic number theory for the first time is, ostensibly, interested in studying, well, number theory. An innocent enough statement, but it begs a much more complicated question: what is number theory? Namely, if a well-educated student of analysis (or even an algebraic geometer!) asked a number theorist what their subject was all about, what should the response be?

This is surprisingly non-obvious question. The answer largely lies in the way in which number theory distinguishes itself from many other subjects of mathematics. I think Hida has said it best (paraphrasing): number theory is a subject not determined by its methods, but by its desiderata—the problems it wants to solve. As an example, one can think of analysis as being the study of objects amenable to the method of limiting processes. Number theory has no such method from which its problems spring. Instead, conversely, its methods (as wide and diverse as they are) are determined instead by the type of problems that it wants to solve.

So, let us rephrase the question: what things does number theory seek to solve/understand?

Remark: At this point I opened up this question to the audience—legitimately interested in what they might say. Unfortunately, I didn’t really get any real response.

The fact that none of you responded is, in my opinion, somewhat typical. Number theory is plagued by a serious discrepancy between its historical perspectives and its modern ones. For example, here are some ‘classical’ answers to the above question:

  • Number theory is the study of primes and their distribution.
  • Number theory is the study of cryptographical systems.
  • Number theory is the study of Diophantine equations.

All of these perspectives are emphasized, for example, in a first undergraduate course on number theory and are what the lay mathematical student (someone that is not interested particularly in number theory) might say.

That said, while these are the classical perspectives on the goals of the subject, things have changed. One more modern perspective on what the ‘goal’ of number theory might be is the following:

  • Number theory is the study of the absolute Galois group G_{\mathbb{Q}}=\text{Gal}(\overline{\mathbb{Q}}/\mathbb{Q}) as a (topological) group and the study of its representations.

For example, the all-consuming web of conjectures known as the Langlands program (what both me and your teacher, my adviser, study!) is concerned with relating the representations of G_{\mathbb{Q}} to algebraic geometry and (harmonic) analysis.

Moreover, the methods and techniques of the course you are currently taking are focused (implicitly) on this more modern perspective. Indeed, the usual big reveal of a first course in algebraic number theory are the results umbrellaed under the name Class Field Theory which, in all actuality, are just a summary of the character theory of G_{\mathbb{Q}} and (implicitly) relating such characters to objects of an algebro-geometric and (harmonic) analytic nature.

This all being said, it is somewhat perverse (as well as a bit jarring) to wholeheartedly embrace this modern perspective without understanding its connections to the more elementary, naive questions that gave birth to it. Less pompously, it’s a bit of a shock to see how different the seeming goals and methods of an undergraduate course in number theory differ from that in a graduate course (let alone the cutting-edge research in the field).

Thus, the goal of today is to try and give some indication as to how this modern perspective of algebraic geometry and the study of G_{\mathbb{Q}} and its actions (representations are just linearized actions!) actually aid the study of the most fundamental of the above ‘classical’ subjects: Diophantine equations.

Why Diophantine equations?

Before we undertake our journey in earnest, let us begin by explaining briefly why Diophantine equations are of interest to a modern number theorist.

We begin, as we should, by recalling the definition of a Diophantine equation. Namely, a Diophantine equation is an equation of the form f(x_1,\ldots,x_n)=0 with f(T_1,\ldots,T_n)\in\mathbb{Z}[T_1,\ldots,T_n] and where we require that (x_1,\ldots,x_n)\in\mathbb{Z}^n. In words, Diophantine equations are the study of the integer roots of integer polynomials. Of course, even though we’ve chosen to focus on a single equation, one should consider simultaneous integer solutions to a family of integral polynomials as Diophantine equations as well.

Let us now give some classic examples of Diophantine equations, in increasing level of difficulty, and roughly how these equations are tackled:

  • Pell’s equations: x^2-ny^2=1 where n\in\mathbb{Z} is some fixed square-free integer. Such equations are taken care of, quite neatly, by the study of continued fractions and the study of the units of the ring \mathbb{Z}[\sqrt{n}] (which is the ring of integers of \mathbb{Q}(\sqrt{n}) if n=2,3\mod 4).
  • Catalan’s equation: x^a-y^b=1 where a,b\in\mathbb{N} are fixed. This has the only solution (a,b,x,y)=(3,2,2,3) (incredible!). This was solved by Mihalescu in 2002 using an incredibly clever argument which does not use much more than the number theory learned in this course.
  • Fermat’s equation: x^n+y^n=z^n with n\in\mathbb{N} fixed. The solutions for n=1 and n=2 are explicitly parameterizable (see below!), and for n>2 there are no non-trivial solutions (i.e. solutions where none of x,y, or z is zero). This was finally proven in the mid ’90s thanks to a huge number of people, most notably being Wiles, Ribet, Frey-Hellegourach, and Mazur. The solution was a tour de force of modern arithmetic geometry, which relied most pivotally on proving a small case of the aforementioned Langlands program.

All of these are incredibly interestingCatalan says that no two consecutive integers (save 8 and 9) are perfect powers, and Fermat’s equation says that (except for n=1 and n=2) the sum of two n^\text{th}-powers cannot be an n^\text{th}-powerbut why are Diophantine equations interesting in general? Why are they worth trying to study systematically, and not focusing on particular equations of interest?

Well, to begin with let’s drop all pretense and give what might be the most obvious answer to ‘what is the point of number theory?’: it’s the study of the integers \mathbb{Z}. But, the study of \mathbb{Z} with what structure? Depending on what your goal is the answer might be as an ordered ring but, for most purposes, the real goal is just to study \mathbb{Z} as a plain ring. How then do Diophantine equations help us in this goal? The (soft) answer lies in the classic theorem of Yoneda. Namely, recall that the Yoneda philosophy tells us that if we want to study \mathbb{Z} as a ring, we should study the sets \text{Hom}_{\mathsf{Ring}}(R,\mathbb{Z}) for all rings R. Well, any ring R looks like \mathbb{Z}[x_i]/(f_j) for some (possibly gigantic) set of variables x_i and equations f_j. The set \text{Hom}_{\mathsf{Ring}}(R,\mathbb{Z}) is nothing more than the set of solutions to the diophantine equations f_j=0. Thus, the study of Diophantine equations, if you buy into the Yoneda philosophy, is equivalent to the study of \mathbb{Z} as a ring.

Remark: Of course, the above is patently imprecise. Namely, the key aspect of Yoneda’s lemma is that you don’t only know \text{Hom}_{\mathsf{Ring}}(R,\mathbb{Z}) as sets for all R but that you actually know the functor \text{Hom}_{\mathsf{Ring}}(-,\mathbb{Z}) or, in other words, how the solutions to these Diophantine equations all relate. Of course, this talk was not meant to be that precise in the first place.

What would a systematic study look like?

Now that we have (hopefully) convinced ourselves that Diophantine equations are worth our study, we need to decide how to systematically study Diophantine equations. Indeed, what the last part of the previous section told us is that to really understand \mathbb{Z} we can’t study specific Diophantine equations (like the three listed above), we study them ALL. But, as you’ll notice, all the specific Diophantine equations above had very specific means of attacking them. If we hope to say anything general we thus need to develop a systematic way of studying Diophantine equations. But, what would this look like?

Let us begin by introducing a little bit of notation. Namely, if f(T_1,\ldots,T_n)\in\mathbb{Z}[T_1,\ldots,T_n] and R is any ring, let us denote by X_f(R) the following set:

X_f(R):=\{(x_1,\ldots,x_n)\in R^n:f(x_1,\ldots,x_n)=0\}

This terminology seems, at the start, to be nothing more than a convenient tool to discuss polynomial solutions. That said, just like the innocuousness of the Lagrange symbol, this simplicity is deceptive as the shift of thinking of the polynomial solutions for one ring (e.g. R=\mathbb{Z}) to all rings is an epoch changing maneuver the surface of which we’ll only just scratch.

But, let’s back off from that highfalutin nonsense for a second. Namely, we said that we are interested in studying Diophantine equations and thus, really, we’re interested in studying sets of the form X_f(\mathbb{Z}). The issue, of course, is that this is hard. The reason this is hard is that sets are so unstructured. A hallmark of mathematics is to exploit the extra structure of an object. Sets, unfortunately, do not have much structure. So, our first order of business will be to replace this highly unstructured set X_f(\mathbb{Z}) with an object for which we will have much more structure to twiddle around with.

To this end, let us begin by replacing X_f(\mathbb{Z}) with something slightly larger. Namely, we replace X_f(\mathbb{Z}) by X_f(\mathbb{Q}). We do this mostly for matters of simplification (related to the fact that \mathbb{Q} is a ‘simpler ring’ than \mathbb{Z}) but for the types of f we care about we’ll see that the containment X_f(\mathbb{Z})\subseteq X_f(\mathbb{Q}) is essentially an equality. That said, while X_f(\mathbb{Q}) might be ostensibly nicer, it’s still just a set, and therefore we still need to make a leap to give us something more amenable to study.

To this end, we replace X_f(\mathbb{Q}) by an even larger set: the set X_f(\overline{\mathbb{Q}}). Now, again, this seems like we’re in the same sort of unstructured territory we’re so desperately trying to escape but, in fact, we’re not. Indeed, the set X_f(\overline{\mathbb{Q}}) comes with something fairly sophisticated: a Galois action. Indeed, since f has rational (integral) coefficients if (x_1,\ldots,x_n)\in X_f(\overline{\mathbb{Q}}) then

\sigma\cdot(x_1,\ldots,x_n):=(\sigma(x_1),\ldots,\sigma(x_n))

is in X_f(\overline{\mathbb{Q}}) for any \sigma\in G_{\mathbb{Q}}. In this way we obtain a G_{\mathbb{Q}}-action on X_f(\overline{\mathbb{Q}}). Moreover, the topological structure of G_{\mathbb{Q}} is not ignored, in the sense that the action of G_{\mathbb{Q}} on X_f(\overline{\mathbb{Q}}) is continuous (when X_f(\overline{\mathbb{Q}}) is given the discrete topology).  And while the passage from X_f(\mathbb{Z}) to X_f(\mathbb{Q}) might, in general, be ‘lossy’ (one can’t necessarily recover the former from the latter) the passage from X_f(\mathbb{Q}) to the continuous G_{\mathbb{Q}}-set X_f(\overline{\mathbb{Q}}) is not: X_f(\mathbb{Q})=X_f(\overline{\mathbb{Q}})^{G_{\mathbb{Q}}} (where the superscript denotes fixed points).

Remark: In a way that one can make pretty precise, the above step is like studying a topological space X by studying the space \widetilde{X} (its universal cover) with its associated \pi_1(X) action. The claim about fixed points becomes the claim that a \pi_1(X)-equivariant map \widetilde{X}\to Y descends uniquely to a map X\to Y.

Thus, we see that we’ve already passed from something incredibly unstructured (the set X_f(\mathbb{Q})) to something with an immense amount of structure (the continuous G_{\mathbb{Q}}-set X_f(\overline{\mathbb{Q}}). But, before we continue, let’s pause to consider what is, perhaps, the simplest example.

Namely, let’s suppose that f(T)\in\mathbb{Z}[T], so that f is a univariate polynomial. What then does the G_{\mathbb{Q}}-set X_f(\overline{\mathbb{Q}}) look like? Well, it’s clear that X_f(\overline{\mathbb{Q}}) is a finite, discrete set with a G_{\mathbb{Q}}-action. Moreover, we can describe precisely the orbit structure of this action. Namely, if f factors over \mathbb{Q}[T] as f_1(T)^{e_1}\cdots f_m(T)^{e_m} with f_i distinct irreducibles of degree d_i, then \# X_f(\overline{\mathbb{Q}}) will be N:=d_1+\cdots+d_m and the orbits of the G_{\mathbb{Q}}-action on X_f(\overline{\mathbb{Q}}) will be precisely the sets X_{f_i}(\overline{\mathbb{Q}}) as i varies.

We can soup this picture up even more. Namely, the way that G_{\mathbb{Q}} acts on this finite set gives a continuous homomorphism

G_{\mathbb{Q}}\to \text{Sym}(X_f(\overline{\mathbb{Q}}))\cong S_N

which evidently factors through an embedded copy of S_{d_1}\times\cdots \times S_{d_m} corresponding to the orbit decomposition of the G_{\mathbb{Q}}-set described above. We can soup this up even further. Namely, we can take the standard/tautological permutation

\rho_f:G_{\mathbb{Q}}\to \text{GL}_N(\mathbb{C})

Less crpytically, letting \mathbb{C}^N have basis \{e_x\}_{x\in X_f(\overline{\mathbb{Q}})} we get the representation \rho_f by declaring that \rho_f(\sigma)(e_x)=e_{\sigma(x)}.

Remark: For those that know what this means, the above representation \rho_f might have a more familiar form. Namely, consider X_f as the scheme \text{Spec}(\mathbb{Q}[T]/(f(T))) and choose an isomorphism \overline{\mathbb{Q}_\ell}\cong\mathbb{C}. Then, under this isomorphism the above representation \rho_f is nothing more than H^0(X_f,\overline{\mathbb{Q}_\ell}) (the zeroth \ell-adic cohomology of X_f).

This representation contains all the essential information about fits irreducible factors and so, in particular, its rational roots (what we’re really after!). But, moreover, the representation \rho_f provides an incredible tool to study the extension \mathbb{Q}_f:=\mathbb{Q}(X_f(\overline{\mathbb{Q}})). Indeed, it’s clear that \rho_f actually factors over \text{Gal}(\mathbb{Q}_f/\mathbb{Q}) and, in fact, gives a faithful (injective) representation \rho_f:\text{Gal}(\mathbb{Q}_f/\mathbb{Q})\to\text{GL}_N(\mathbb{C}). One can then try to completely understand properties of the extension \mathbb{Q}_f/\mathbb{Q} by studying this representation \rho_f. For example, we can entirely characterize the set \text{Spl}(\mathbb{Q}_f/\mathbb{Q}) of split primes (which completely characterizes \mathbb{Q}_f by Chebotarev density) as follows:

\text{Spl}(\mathbb{Q}_f/\mathbb{Q})=\{p:\rho_f(\text{Frob}_p)=I_N\}=\{p:\text{tr}(\rho_f(\text{Frob}_p))=N\}

where, of course, the above only really makes sense for unramified p.

This is great. Namely, it allows us to study the G_{\mathbb{Q}}-set X_f(\overline{\mathbb{Q}}) and the extension \mathbb{Q}_f/\mathbb{Q} using the incredibly rich theory of representations of finite groups (namely representations of the group \text{Gal}(\mathbb{Q}_f/\mathbb{Q})). Thus, at least in this case, we’ve eschewed the bonds of unstructured sets and donned the incredibly powerful wears of finite group representation theoryquite the upgrade!

Unfortunately, the above doesn’t quite work for general f (i.e. not univariate f). Namely, we can still pass from the set X_f(\mathbb{Q}) to the continuous G_{\mathbb{Q}}-set X_f(\overline{\mathbb{Q}}), but the next step is bound for failure in general. Namely, we built this representation \rho_f from the tautological representation of S_N\cong\text{Sym}(X_f(\overline{\mathbb{Q}})), and for more general f this set X_f(\overline{\mathbb{Q}}) is going to be (countably) infinite. And, as one knows, the study of infinite-dimensional representations isn’t very useful/tractable without more structure. We need to develop a more sophisticated technique if we hope to proceed further.

Remark: One of the students in the class asked “Why not try to study these infinite dimensional representations using techniques of functional analysis?” To this I responded as follows. This is not a terrible idea, but the naive approach is not going to work. Namely, if we want to study representations into things like Hilbert spaces, we generally need our group to be sufficiently ‘analytic’. So, for example, one can study such Hilbert space representations for things like Lie groups since, after all, both are objects of a complex analytic nature, and thus are likely to have something interesting to say to one another. Trying this for our group G_{\mathbb{Q}} is bound to fail in a literal sense since G_{\mathbb{Q}} is ‘anti-analytic’. That said, trying to study G_{\mathbb{Q}} by studying Hilbert space (with a small caveat) representations of a related group is the entire premise of the Langlands program!

So, if we’re not going to be able to use the naive definition of representations \rho_f for general f, then how shall we proceed? We begin by making an observation about one of the benefits that having very complicated X_f(\overline{\mathbb{Q}}) gives us. As an example, let’s consider the polynomial f(T_1,T_2)=T_1^2+T_2-1\in\mathbb{Z}[T_1,T_2]. Then, as said above, X_f(\overline{\mathbb{Q}}) is infinite. But, something spectacular happens here. Namely, if we enlarge even further something magical appears. Namely, if we replace X_f(\overline{\mathbb{Q}}) with X_f(\mathbb{C}) a clear-cut advantage of this case over the univariate case occurs. Namely, in the univariate case the \mathbb{C}-points are still just a discrete set of points. But, in the case of our current f we get that X_f(\mathbb{C}) is \mathbb{C}-\{p,q\}the twice punctured plane. A gloriously rich topological object.

To this end, for any f(T_1,\ldots,T_n)\in\mathbb{Z}[T_1,\ldots,T_n], let us give X_f(\mathbb{C}) the structure of a topological space by considering it as a subspace of \mathbb{C}^n with the obvious embedding X_f(\mathbb{C})\subseteq\mathbb{C}^n. Then, we see that what non-univariate polynomials lack in the naive amenability to representation theory, they make up for in having rich, intrinsic geometric structure.

With all of this structure uncovered, we can say (very broadly) how the systematic study of Diophantine equations goes. Namely, given f we study X_f(\mathbb{Z}) through the continuous G_{\mathbb{Q}}-set X_f(\overline{\mathbb{Q}}), the geometry of X_f(\mathbb{C}), and their interaction. This then creates a subject where one studies Diophantine equations by a mix of number theory and algebraic geometry which, in modern parlance, would be called arithmetic geometry.

We will come back to how this all relates to studying G_{\mathbb{Q}} and its representations. But, before we do, we’d like to give a concrete example of using a mix of geometry and number theory to study a particular family of Diophantine equations.

An extended example

The example

So, as we said at the end of last section, we seek now to exploit our newfound methodology to study Diophantine equations to at least indicate how a natural class of such equations can be studied. What class of equations? Well, let us remark that in basic number theory the amount of honest Diophantine equations one solves is remarkably slim. To wit, this past summer I taught an elementary number theory course at UC Berkeley, and the only class of Diophantine equations I was able to solve was linear Diophantine equations in any number of variables. This is exceedingly simple, and essentially comes down to the Euclidean algorithm. The rest of the term was then, in fact, not studying Diophantine equations but ‘models’ for Diophantine equations: equations over finite fields.

Thus, any non-linear family of Diophantine equations is of current interest to us. So, with that being said, our goal is to use our broad methodology to prove the following:

Theorem(Legendre): Let a,b,c\in\mathbb{Z} be non-zero and pairwise coprime. Let f_{a,b,c}(x,y,z)\in\mathbb{Z}[x,y,z] be defined by f_{a,b,c}(x,y,z)=ax^2+by^2+cz^2. Then, the Diophantine equation

f_{a,b,c}(x,y,z)=0

has a non-zero solution if and only if:

  1. Not all of a,b, and c are the same sign.
  2. The number -ab is a square modulo |c|, the number -ac is a square modulo |b|, and the number -bc is a square modulo |a|.

Moreover, given one non-zero solution there is a natural way to parameterize all the solutions.

Note that this theorem is actually incredibly useful, because the theory of Quadratic Reciprocity gives us an incredibly powerful, quick means of checking 2. above.

So, before we dive into how geometry and number theory unite to prove Legendre’s Theorem, let us remark why this is, in some sense, the ‘next simplest example’ of equations after the univariate case. Of course, this may seem strangehow is this the next obvious case after the univariate case?but as we’ll see, our lens by which to view Diophantine equations makes this precise.

So, towards this end, let us note that the set X_{a,b,c}(\mathbb{Q}):=X_{f_{a,b,c}}(\mathbb{Q}) has a lot of redundancy. Namely, given any (x,y,z)\in X_{a,b,c}(\mathbb{Q}) and any \lambda\in\mathbb{Q}^\times one obtains another solution (\lambda x,\lambda y,\lambda z) just by the virtue of the fact that f_{a,b,c} is homogenous. Note that this ‘line’ of solutions really shouldn’t be thought of as different solutions and the more fundamental object should be something like X_{a,b,c}(\mathbb{Q})/\mathbb{Q}^\times. So, to this end, if f(T_1,\ldots,T_n) is a homogenous polynomial then we define P_f(R):=(X_f(R)-\{0\})/R^\times. Note then that one immediately has P_f(\mathbb{Z})=P_f(\mathbb{Q}).

Remark: For those that know what this means, note that this is technically the wrong definition of P_f for general R. Namely, we’d like to imagine that P_f is just the projective scheme \text{Proj}(\mathbb{Z}[T_1,\ldots,T_n]/(f(T_1,\ldots,T_n))) but the R-points of this are not, in general, just (X_f(R)-\{0\})/R^\times. The issue is that this projective scheme is essentially the  moduli space for the quotient (X_f-\{0\})/\mathbf{G}_m and, of course, in general we don’t expect quotient sheaves to have R-points the quotient of the individual R-points. The issue, as one can quickly deduce, lies in H^1_{\text{fppf}}(\text{Spec}(R),\mathbf{G}_m)=\text{Pic}(R). But, this is not really going to be of interest to us since we’ll be dealing primarily with local rings R.

Note that, just as in the case of X_f one has that P_f(\overline{\mathbb{Q}}) has a continuous G_{\mathbb{Q}}-action and P_f(\mathbb{C}) is a topological space (namely one takes the quotient space (X_f(\mathbb{C})-\{0\})/\mathbb{C}^\times).

So, with this in mind, we can explain why the polynomials f_{a,b,c} are the next obvious choices after univariate polynomials. Namely, note that if f is univariate then X_f(\mathbb{C}) is a discrete topological space or, equivalently, a 0-dimensional compact complex manifold. There is an obvious parameter to tick up here: the dimension. So, the next obvious polynomials to look at are those such that X_f(\mathbb{C}) is a 1-dimensional compact complex manifold. But, even amongst 1-dimensional compact complex manifolds there is a simplest. Namely, by the classification of compact orientable surfaces (of which every complex manifold is) there is a parameter given by the number of holes, the genus g. The simplest one is then the surface of genus 0: the Riemann sphere \mathbb{CP}^1.

Now, it seems obvious that the next obvious class of polynomials we should study are those such that X_f(\mathbb{C}) is the Riemann sphere. Unfortunately, no such f exist. That said, there are homogenous polynomials f such that P_f(\mathbb{C}) is \mathbb{CP}^1. In fact, one can show that P_f(\mathbb{C}) is \mathbb{CP}^1 means that (up to isomorphism) our polynomial f is one of the f_{a,b,c} as in the preamble to Legendre’s theorem. Thus, Legendre’s theorem really is the next logical step in our study of Diophantine equations.

Remark: Two remarks are in order for those that know what this means. First, the above statement really concerns not P_f as a finite-type \mathbb{Z}-scheme but its generic fiber over \mathbb{Q}. Moreover, to make the above claim correct, we can’t have that P_f(\mathbb{C}) is just topologically the Riemann surface, we also need that it’s the Riemann surface as a complex analytic space: that P_f^\text{an} is \mathbb{CP}^1. If we assume that P_f is smooth then homeomorphism is enough, but the cuspidal cubic is not one of the f_{a,b,c} and has analytification homeomorphic to \mathbb{CP}^1.

The proof of the above claim is pretty simple. Namely, any such f must have that P_f (defined as its associated projective scheme over \mathbb{Q}) it a smooth geometrically integral genus 0 curve. One can then show that for any such curve X one obtains an embedding X\hookrightarrow\mathbb{P}^2_\mathbb{Q} by considering \omega_X^{-1} (the inverse of the canonical bundle) which follows easily from Riemann-Roch. One then realizes X is a hypersuface in \mathbb{P}^2_{\mathbb{Q}} which, by genus considerations, must be degree 2. The explicit form then follows from the elementary diagonalization of quadratic forms.

The fact that X_f(\mathbb{C}) can never be \mathbb{CP}^1 follows from the standard fact about analytifications: that the analytification is compact if and only if the scheme is proper. By design X_f is affine, so if it is also proper, then it’s finite. But, then X_f(\mathbb{C}) is a finite set of points.

The geometric part

So, now that we see why the Diophantine equations addressed by Legendre’s theorem are the next logical step after univariate polynomials, we can begin to try to understand how one might prove the theorem itself. Let us first address the geometric side of the picturethe claim that one can use a single non-trivial solution  allows one to parameterize all the solutions of the Diophantine equation. Indeed, to show this we will exploit the geometry of P_f(\mathbb{C}) at least morallywe’ll be exploiting the geometry of the scheme which is, essentially, the same thing.

So, to make this claim precise, let us assume f is one of the equations from Legendre’s theorem and that x_0\in P_f(\mathbb{Q}) is given. We then define a bijection

P_f(\mathbb{Q})\to \mathbb{P}^1(\mathbb{Q})=\mathbb{Q}\cup\{\infty\}

as follows. We can imagine P_f(\mathbb{Q}) as being a subset of (\mathbb{Q}^3-\{0\})/\mathbb{Q}^\times=\mathbb{P}^2(\mathbb{Q}). In particular, it looks like a subset cut out by a degree 2 equation. For any point x\in P_f(\mathbb{Q}) there is a unique line L_x in \mathbb{P}^2(\mathbb{Q}) passing through x and x_0. Map x to the point in \mathbb{P}^1(\mathbb{Q}) which is the slope of the line L_x. Of course, we think of L_{x_0} as having slope \infty and so x_0 maps to \infty. One can easily check that this map is surjective, and a basic verison of Bezout’s Theorem (which can be proven topologically/geometrically) says that if x\ne y then L_x\ne L_yindeed, if L_x=L_y then L_x would intersect P_f(\mathbb{Q}) at at least three places (namely x,y,x_0) and this is impossible since the total number of intersection points of a subset cut out by a degree 2 equation and a line is 2. This gives the desired bijection.

Remark: This is much nicer if one can draw a picture illustrating this point.

For those that know what this means, this is just the isomorphism P_f\to\mathbb{P}^1_{\mathbb{Q}} given by the line bundle \mathcal{O}(x_0).

As an example of this, one can check that for f_{1,1,-1} and point x_0=(0,0,1) one gets that the bijection from above shows that the rational points of P_f (besides x_0) are essentially of the form \displaystyle \left(\frac{p^2-q^2}{p^2+q^2},\frac{-2pq}{p^2+q^2},1\right) where \displaystyle \frac{p}{q} denotes the slope in reduced form. The integer solution in X_{1,1,-1}(\mathbb{Q}) in the class of this point is (p^2-q^2,-2pq,p^2+q^2). This should look familiar and, indeed, X_{1,1,-1}(\mathbb{Z}) is just the set of Pythagorean triples, and this geometric procedure produces the usual parameterization of such triples.

The arithmetic part

Now that we have taken care of the geometric part of Legendre’s theorem, we need to explain the arithmetic part, the condition for which f_{a,b,c} has a non-zero solution or, equivalently, for P_{a,b,c}(\mathbb{Q}):=P_{f_{a,b,c}}(\mathbb{Q})\ne\varnothing.

The motivation

The beginning is, in fact, also geometrically minded. For the next two minutes I will assume that people in the audience have basic familiarity with algebraic geometry. For those that do not, don’t fret, the conclusion will not have a lick of algebraic geometry in itwe’re just talking motivation.

So, it’s clear that from P_{a,b,c} (thought about as the projective scheme \text{Proj}(\mathbb{Z}[x,y,z]/(f_{a,b,c}))) we get a map f:P_{a,b,c}\to\text{Spec}(\mathbb{Z}). We can then think of P_{a,b,c}(\mathbb{Q})=P_{a,b,c}(\mathbb{Z}) as sections s of this map f. Now, if one is trying to build a section of a map g:X\to Y of topological space, a common technique might be to try and build, for each y\in Y, a section s_y of g^{-1}(U_y)\to U_y where U_y is a ‘very small’ neighborhood around y. One then tries to glue these sections s_y together to get a global section s. So, if we try to analogize this property for P_{a,b,c}\to\text{Spec}(\mathbb{Z}) we want to, for each prime p, build a section from a ‘very small’ neighborhood of p in \text{Spec}(\mathbb{Z}).

The question then comes what a ‘very small’ neighborhoood of p in \text{Spec}(\mathbb{Z}) might look like. The Zariski neighborhoods are too coarse, and so one needs to opt for something else. Namely, we think of a sufficiently small neighborhood of a point y as being a neighborhood so small that properties of something at y should extend to properties of that entire neighborhood. In other words, we want a neighborhood of p such that the only obstruction to something in that neighborhood is an obstruction at p. Since we’re thinking about polynomial equations, we think of an obstruction as being the lack of a solution to a polynomial equation P(T_1,\ldots,T_n)\in\mathbb{Z}[T_1,\ldots,T_n]. So, what does it mean for an equation to be unobstructed ‘at p‘?

Well, the naive guess is that it means the polynomial P doesn’t have a solution modulo p. But, this isn’t good enough. Namely, the prime p gives more chance for obstruction than just modulo p. What about modulo p^2? What about modulo p^3? It then seems reasonable to guess that unobstructed ‘at p‘ means that the polynomial has a solution modulo \mathbb{Z}/p^n\mathbb{Z} for all p.

So, this ‘very small’ neighborhood should have the property that being unobstructed at p is the same thing as being unobstructed in this neighborhood. Since we’re dealing with algebraic geometry, the neighborhood should be the spectrum of some ring R, and reinterpreting this last sentence means that a polynomial equation P should have a solution in R if and only if it has a solution in \mathbb{Z}/p^n\mathbb{Z} for all n. This is exactly describing the ring \mathbb{Z}_p of p-adic integers.

So, a ‘very small’ neighborhood of p in \text{Spec}(\mathbb{Z}) is \text{Spec}(\mathbb{Z}_p). Thus, a section of P_{a,b,c}\to\text{Spec}(\mathbb{Z}) ‘very close’ to p should mean a section of (P_{a,b,c})_{\text{Spec}(\mathbb{Z}_p)}\to\text{Spec}(\mathbb{Z}_p) or, in other words, an element of P_{a,b,c}(\mathbb{Z}_p)=P_{a,b,c}(\mathbb{Q}_p).

Thus, the rephrasing of the geometric question of whether we can glue local sections to global sections is whether or not we can glue the elements of P(\mathbb{Q}_p) together for all p to obtain a point of P(\mathbb{Q}). In fact, one should also include P(\mathbb{R}) in this setup. Indeed, given the function field K(X) of a curve X, the points of X correspond to the valuations on K(X) and the valuations of \mathbb{Q} are v_p, which corresponds to \mathbb{Q}_p, for all p and v_\infty which corresponds to \mathbb{R}.

So, concretely, the question becomes whether or not P_{a,b,c}(\mathbb{Q}_v)\ne\varnothing for all valuations v (define \mathbb{Q}_\infty=\mathbb{R}) implies that P_{a,b,c}(\mathbb{Q}) is non-empty. If P_f satisfies this for a homogenous f we say that f satisfies the local-to-global principle (or Hasse principle). Of course, there is no reason, a priori, to even expect this. Indeed, even for the sections of a map, one at least expect to impose some compatibility conditions on the overlaps of these local sections and the local-to-global principle doesn’t require the analogue of such compatibilities (what would that even mean?).

That said, let us see that the claim that the polynomials f_{a,b,c} satisfy the local-to-global principle completely solves the first part (the criterion for non-zero solutions) of Legendre’s theorem. Indeed, we claim that the conditions stated are precisely the same as requiring P_{a,b,c}(\mathbb{Q}_v)\ne\varnothing for all v. For example, 1. is easily seen to be equivalent to the claim that P_{a,b,c}(\mathbb{R})\ne\varnothing.

Suppose now that p is a prime that does not divide any of a,b or c. We then claim that in this case P_{a,b,c}(\mathbb{Q}_p)\ne\varnothing without any effort whatsoever. Indeed, note that P_{a,b,c}(\mathbb{Q}_p)=P_{a,b,c}(\mathbb{Z}_p) and the multi-variable version of Hensel’s lemma (that does exist!) tells us that the reduction map P_{a,b,c}(\mathbb{Z}_p)\to P_{a,b,c}(\mathbb{F}_p) is a surjection. Thus, it suffices to show that P_{a,b,c}(\mathbb{F}_p)\ne\varnothing. There are several ways to do this, but one very clear one is the following. Since a,b,c are non-zero modulo p we may assume (by dividing through) that c=1. We are then trying to find the number of non-zero solutions to ax^2+by^2+z^2=0 in \mathbb{F}_p and, in particular, show it’s non-zero. But, it suffices to show that there’s a solution to ax^2+by^2+1=0. But, this is equivalent to showing that the elements of \mathbb{F}_p of the form ax^2 and the elements of the form -1-by^2 have to have an element in common. But, note that since a,b are non-zero each has precisely \displaystyle \frac{p-1}{2}+1 elements expressible in that form, and so if they had none in common, there’d be \displaystyle 2\left(\frac{p-1}{2}+1\right)=p+1 elements of \mathbb{F}_p, which is ridiculous. Thus, ax^2+by^2+cz^2=0 has a non-zero solution in \mathbb{F}_p and thus P_{a,b,c}(\mathbb{Q}_p)\ne\varnothing as desired.

Finally, let p be a prime that divides one of a,b,c. Note that by our assumption on the greatest common divisors of the a,b,c (which is really no condition by dividing out any common divisorsif two of the coefficients is divisible by p, so must be the third) we know that p will divide precisely one of the a,b,c. Let’s assume, without loss of generality that p\mid c. Now, note that we want to begin the argument as in the previous case by using Hensel’s lemma. But, unfortunately, P_{a,b,c} is not smooth over \mathbb{Z}_p. So, to make things work, we instead work with U\subseteq P_{a,b,c} defined as the those tuples in P_{a,b,c} with non-vanishing z-coordinate (or, in algebraic geometry language, D_+(z)\cap P_{a,b,c} where the intersection is happening in \mathbb{P}^2_{\mathbb{Q}}). Note then that U is, in fact, smooth over \mathbb{Z}_p (over \mathbb{F}_p the non-smooth points of P_{a,b,c} occur on the complement of D(z)!). Thus, by Hensel’s lemma it suffices to show that U(\mathbb{F}_p) is non-empty. To see this we need to show that there is a solution to ax^2+by^2=0 with z\ne 0. But, note that this is possible since -ab is a square modulo p so that t^2=-ab and thus, by the multiplicativity of the Legendre symbol, \displaystyle z^2=\frac{-b}{a} is solvable. So, if t_0 is such a solution then evidently x=t_0, y=1, and z=1 work. Thus, U(\mathbb{Z}_p)\subseteq P_{a,b,c}(\mathbb{Z}_p)=P_{a,b,c}(\mathbb{Q}_p) is non-empty as desired.

The local-to-global principle

So, now that we’ve reduced Legendre’s theorem to the local-to-global principle for polynomials like f_{a,b,c} we finish by applying some interesting number theoretic results to deduce the desired property. Namely, we will introduce an interesting object that will come later on in this course, and explain how its properties (that you’ll discuss) actually prove the local-to-global principle for the polynomials f_{a,b,c}.

So, to this end, we take an ostensibly ninety-degree turn. Namely, for any field K let us define a central simple algebra over K to be a (possibly non-commutative) K-algebra A with the property that A\otimes_K \overline{K} is isomorphic to \text{Mat}_n(\overline{K}) as an \overline{K}-algebra for some n. Let us say that two central simple algebras A and B are equivalent (written A\sim B) if there exists integers m and n Such that \text{Mat}_n(A)\cong\text{Mat}_m(B) as K-algebras. Finally, let the Brauer group of K, denoted \text{Br}(K), the group of central simple algebras over K up to equivalence, with group operation being tensor product. The identity of \text{Br}(K) is the equivalence class of K. Finally, we say that a central simple algebra is split if it’s equivalent to K.

Remark: If you know what this means, \text{Br}(K)=H^2(G_K,\overline{K}^\times).

Now that if one has a map of fields L\to K one obtains a map of Brauer groups \text{Br}(L)\to\text{Br}(K) given by A\mapsto A_K:=A\otimes_L K. Indeed, it’s fairly easy to see that A_K is a central simple K-algebra since

A_K\otimes_K \overline{K}=(A\otimes_L \overline{L})\otimes_K \overline{K}=\text{Mat}_n(\overline{L})\otimes_K \overline{K}=\text{Mat}_n(\overline{K})

and one can check that this map is, in fact, well-defined (i.e. if A\sim B then A_K\sim B_K).

Our goal in this section is to explain how to use the Brauer groups, and their properties for local/global fields, to prove the local-to-global principle for the f_{a,b,c}‘s. This is, in some sense, a gross over-complication of the matter. But, our reasons for doing this are three-fold:

  1. The whole goal of this post was to explain how ‘serious’ modern number theory that will show up in this course can be used to answer concrete questionsthe below is a perfect example of this.
  2. The proof of the below (while dramatic overkill) gives a much more conceptual understanding of why the local-to-global principle holds for the f_{a,b,c}. There is a way to prove the local-to-global principle for f_{a,b,c}‘s using ‘elementary techniques’ (see here for instance), but they don’t really explain how the result fits into the larger context of abelian reciprocity laws/class field theory.Specifically, they don’t really explain in any essentially complete way why these Diophantine equations (and not many others) satisfy local-to-global principle. The stated result below should be thought of as the ‘true’ local-to-global statement, and the local-to-global property for our f_{a,b,c}‘s comes from a coincidental (or deep?) connection to central simple algebras.
  3. The below method actually is (essentially) equivalent to a local-to-global result for Diophantine equations of a much more general form. Namely, let \{f_1,\ldots,f_m\} be homogenous polynomials in the same number of variables, and consider the Diophantine equation P:=P_{f_1}\cap\cdots P_{f_m}the simultaneous Diophantine equations f_1=\cdots=f_m=0 (for people that know what this means one should take P=\text{Proj}(\mathbb{Q}[T_1,\ldots,T_n]/(f_1,\ldots,f_m)). Let us call this set of Diophantine equations Brauer-Severi if there is a (complex analytic) isomorphism P(\mathbb{C})\to \mathbb{P}^n(\mathbb{C}) (where \mathbb{P}^n(\mathbb{C})=(\mathbb{C}^{n+1}-\{0\})/\mathbb{C}^\times) which restricts an isomorphism P(\overline{\mathbb{Q}})\to\mathbb{P}^n(\overline{\mathbb{Q}}) (for those that know what this means, we just mean that P_{\overline{\mathbb{Q}}}\cong \mathbb{P}^n_{\overline{\mathbb{Q}}}). Then, the below results can be (mostly) summarized as the claim that Brauer-Severi Diophantine equations satisfy the local-to-global principle. This is the correct light in which to view the result for the f_{a,b,c}‘s and really explains why their geometry is pivotal.

Before we go any further it’s worth discussing an explicit example of such objects. Namely, the Hamiltonian Quaternions \mathbb{H} are an example of a central simple algebra over \mathbb{R}. Indeed, one can check that \mathbb{H}\otimes_\mathbb{R}\mathbb{C}\cong\text{Mat}_2(\mathbb{C}). In fact, one can show that, up to equivalence, the only central simple algebras over \mathbb{R} are \mathbb{R} and \mathbb{H} so that \text{Br}(\mathbb{R})\cong\mathbb{Z}/2\mathbb{Z}.

So, why does this have anything to do with local-to-global principle for the f_{a,b,c}‘s? To begin, let us note that if we’re willing to take a,b\in\mathbb{Q} we can basically only consider the polynomial equations f_{\alpha,\beta}:=\alpha x^2+\beta y^2+z^2. Note then that P_{a,b,c}(\mathbb{Q})=P_{f_{\alpha,\beta}}(\mathbb{Q}) with \displaystyle \alpha=\frac{a}{c} and \displaystyle  \beta=\frac{b}{c}. So, it suffices to explain how the polynomials f_{\alpha,\beta} relate to the Brauer group. So, to this end, let us define another concrete example of a central simple algebra over \mathbb{Q} which, in a clear way, is a variation on a theme of the Hamiltonian Quaternions. Namely, for \alpha,\beta\in\mathbb{Q}^\times let Q(\alpha,\beta) be the following central simple algebra over \mathbb{Q}:

Q(\alpha,\beta)=\mathbb{Q}\oplus\mathbb{Q}i\oplus\mathbb{Q}j\oplus\mathbb{Q}k

with i^2=\alpha, j^2=\beta, ij=k, and ij=-ji.

The theorem that then relates everything is the following:

Theorem: Let \alpha,\beta\in\mathbb{Q}^\times. Then, for any field K/\mathbb{Q} the central simple algebra Q(\alpha,\beta)_K is split if and only if P_{f_{\alpha,\beta}}(K)\ne\varnothing.

Proof(sketch): We first claim that Q(\alpha,\beta)_K is non-split if and only if it’s a division algebra. This follows at once from the well-known Artin-Wedderburn theorem. Indeed, since Q(\alpha,\beta)_K is simple with center K we know that, as a K-algebra, it’s isomorphic to \text{Mat}_n(D) with D/K is a central division algebra. But, by dimension considerations we see that either Q(\alpha,\beta)_K is either D or D=k.

So, whether or not Q(\alpha,\beta)_K is split is equivalent to knowing whether it’s a division algebra. But, note that there is a norm map N:Q(\alpha,\beta)_K\to K given by

N(x+yi+zj+wk)=x^2-\alpha y^2-\beta z^2+\alpha\beta w^2

One can check that N(q)=q\overline{q} where \overline{q}=x-yi-zj-wk) if q=x+yi+zj+wk. Thus, N is multiplicative, and it’s pretty easy to check that q\in Q(\alpha,\beta)_K^\times if and only if N(q)=0.

Thus, the splitness of Q(\alpha,\beta)_K is equivalent to the existence of a non-zero non-unit of Q(\alpha,\beta)_K which is, by the previous paragraph, equivalent to the existence of x,y,z,w\in K (not all zero) such that x^2-\alpha y^2-\beta z^2+\alpha\beta w^2=0. This looks like moderately close to the existence of a point in P_{f_{\alpha,\beta}}(K) and, as it turns out (with a bit of algebra grease), that it’s true. You can look in the wonderful text Central Simple Algebras and Galois Cohomology (chapter 1) by Szamuely for the details. \blacksquare

Remark: The more conceptual reason for the above can be explained as follows. By a previous remark one can identify \text{Br}(K) with H^2(G_K,\overline{K}^\times). The short exact sequence of G_K-groups

1\to\overline{K}^\times\to \text{GL}_n(\overline{K})\to \text{PGL}_n(\overline{K})\to 1

gives, by the connecting homomorphism and Hilbert Theorem 90, an injection of the form H^1(G_K,\text{PGL}_n(\overline{K}))\to H^2(G_K,\overline{K}^\times). But, note that \text{PGL}_n(\overline{K})=\text{Aut}(\mathbb{P}^n_{\overline{K}}) and thus, by the theory of twists, H^1(G_K,\text{PGL}_n(\overline{K}) classifies varieties over K which become isomorphic to \mathbb{P}^n_{\overline{K}} over \overline{K}the Brauer-Severi varieties.

In particular, such a variety V gives a class [V]\in H^1(G_K,\text{PGL}_n(\overline{K})) and thus, by the above, a class [V]\in\text{Br}(K). Moreover, one can show that V\cong \mathbb{P}^n_K (i.e. isomorphic over K not \overline{K}) if and only if V(K)\ne\varnothing. Thus, we see that V(K)\ne\varnothing if and only if [V]\in\text{Br}(K) is trivial.

One can check, as was already done in the section labeled ‘The geometric part’, that P_{a,b,c} becomes isomorphic to \mathbb{P}^1_{\mathbb{Q}} over \overline{\mathbb{Q}} and thus, putting this all together, we see that for every P_{a,b,c} and any field K/\mathbb{Q} one can associate some class [P_{a,b,c}] in H^1(G_K,\text{PGL}(\overline{K}))\subseteq\text{Br}(K) such that P_{a,b,c}(K)\ne\varnothing if and only if [P_{a,b,c}]\in\text{Br}(K) is split (i.e. trivial). As one might guess, [P_{a,b,c}] is the central simple algebra Q(\alpha,\beta)_K with \displaystyle \alpha=\frac{a}{c} and \displaystyle \beta=\frac{b}{c}.

So, from this we see that the local-to-global principle becomes equivalent to the claim that Q(\alpha,\beta) is split if and only if Q(\alpha,\beta)\otimes_\mathbb{Q}\mathbb{Q}_v is split for all valuations v of \mathbb{Q}. So, how does one go about doing this?

The key is the following result that will be proven in this class:

Theorem(Fundamental sequence): There are isomorphisms \text{inv}_v:\text{Br}(\mathbb{Q}_p)\cong\mathbb{Q}/\mathbb{Z} and \text{inv}_v:\text{Br}(\mathbb{R})\cong \frac{1}{2}\mathbb{Z}/\mathbb{Z}\subseteq\mathbb{Q}/\mathbb{Z} such that the following sequence is exact:

\displaystyle 0\to\text{Br}(\mathbb{Q})\xrightarrow{L}\bigoplus_v \text{Br}(\mathbb{Q}_v)\xrightarrow{\text{inv}}\mathbb{Q}/\mathbb{Z}\to 0

where L(A)=(A\otimes_\mathbb{Q}\mathbb{Q}_v) and \text{inv}(A_v) is \displaystyle \sum \text{inv}_v(A_v).

This is an incredibly deep theorem. It contains within in it quite a bit of the wreckingball that is Class Field Theory. As an example, the fact that \text{inv}\circ L is trivial contains within it all the reciprocity laws (quadratic, cubic, Eisenstein) from elementary algebraic number theory. The proof comes from a detailed proof of the cohomology of the ideles of finite extensions of \mathbb{Q}.

In particular, we see that this deep number theoretic result contains within it also the local-to-global principle for the polynomials f_{a,b,c}. Indeed, taking \displaystyle \alpha=\frac{a}{c} and \displaystyle \beta=\frac{b}{c} we have already observed that the local-to-global principle for f_{a,b,c} is equivalent to the fact that Q(\alpha,\beta) is split if and only if Q(\alpha,\beta)\otimes_\mathbb{Q}\mathbb{Q}_v is split for all v. Using the terminology from the above theorem this is equivalent to the fact that Q(\alpha,\beta) is split if and only if L(Q(\alpha,\beta)) is split. But, by the definition of the Brauer group this follows from the injectivity of L.

In fact, we actually get something stronger from the above theorem. Namely, one can quite easily show that for each \alpha,\beta that the element Q(\alpha,\beta) is 2-torsion (prove this for yourself!). So, in particular, what we see is that if L(Q(\alpha,\beta)) is non-zero, then it must be tuple of the form (a_v) with each a_v equal to 1 (in which case it’s split) or \frac{1}{2} (in which case it’s not). But, noting that \text{inv}(L(Q(\alpha,\beta))=0 we actually conclude that the number of places v with a_v=\frac{1}{2} is actually even! In particular, if we know that Q(\alpha,\beta) is split over K_v, in other other words P_{a,b,c}(\mathbb{Q}_v)\ne\varnothing, for all but one v then, in fact, Q(\alpha,\beta) is split over every K_v and thus, in particular, P_{a,b,c}(\mathbb{Q})\ne\varnothing. So, we have actually proven something stronger than Legendre’s therem.

Conclusion

We see that Legendre’s theorem, perhaps the most basic of all non-trivial Diophantine equations, requires (in its ‘correct’ proof) a true modern perspective: the combination of number theory and algebraic geometryit requires arithmetic geometry. Quite amazing!

What comes next?

So, now that we’ve handled the Diophantine equations as in Legendre’s theorem, what type of equations come next? Using our previous discussion as a clue, we might try to look for polynomials f such that P_f(\mathbb{C}) are the next most complicated 1-dimensional compact complex manifolds: compact Riemann surfaces of genus 1.

These are, essentially, the so-called elliptic curves over \mathbb{Q}. The study of their rational points, the solutions of the Diophantine equation, is a widely studied topic which comprises quite a sizable portion of modern research. For example, one can show that if E:=P_f is such an elliptic curve that E(\mathbb{Q})=E(\mathbb{Z}) is a finitely generated abelian group. We can then write E(\mathbb{Q})=\mathbb{Z}^r\oplus T where T is a finite abelian group. Deep work of Mazur has identified the possible choices for T: \mathbb{Z}/N\mathbb{Z} for N=1,\ldots,10,12 and (\mathbb{Z}/2\mathbb{Z})\times(\mathbb{Z}/2N\mathbb{Z}) for N=1,2,3,4. What the possibilites of r can be is an incredibly intense area of modern research. It’s even hotly debated whether or not the possible r are finite. Recent work of Manjul Bhargava shows that, probabilistically, half of the time r=0 and half of the time r=1 (this was one of the main reasons he received the Fields Medal). Moreover, there are deep conjectures (most notably the Birch and Swinnerton-Dyer conjecture) that relates this r to important analytic objects (their L-functions) which, due to Wiles, is the same thing as analytic objects coming from Harmonic analysis (automorphic L-functions).

The next step might be to study f with P_f(\mathbb{C}) a curve of genus g>2. The general theory of such f is fairly sparse. But, thanks to Gerd Faltings we know an incredibly powerful qualitative result about the solutions to such Diophantine equations. Namely, Faltings shows that P_f(\mathbb{Q}) is always finite. An incredibly stunning results considering the incredible variety of examples it encompasses.

Remark: Not that most of this note has been rigorous, but it should be noted that the above paragraph was a discussion mostly in analogies. There are no f with P_f(\mathbb{C}) having genus 2 for examplegenus 2 curves are not hypersurfaces. One should, if you know what this means, replace P_f by  a smooth geomerically integral proper curve of genus g in the above paragraph.

It should also be noted that we are largely, in the above, replacing the finite-type \mathbb{Z}-scheme with its generic fiber.

After one is done with curves, the next step is to move onto Diophantine equations whose \mathbb{C}-points are higher-dimensional compact complex manifolds. Specifically, next up would be the study of proper surfaces (those Diophantine equations with associated complex points a 2-dimensional compact complex manifold). These are broken down into cases, similar to the partition of curves into their genus classes, by their minimal models. Namely, there is a classification of minimal surfaces (over \overline{\mathbb{Q}}, and this allows one to study the rational points of surfaces over \mathbb{Q} by what their associated minimal surface over \overline{\mathbb{Q}} is. See this nice reference for a leisurely discussion of the topic.

Moving on from this point things get even more hard. Namely, for dimensions larger than 2 there is no real geometric classification of the objects involved and so study by breaking them into similar classes seems impossible. Thus, there is currently no real general theory for Diophantine equations whose \mathbb{C}-points are compact complex manifolds of dimension greater than 2, only theory for such Diophantine equations of certain special forms (e.g. abelian varieties).

Relation to the ‘modern perspective’

I can’t resist, as a final note, explaining how this perspective on Diophantine equations allows us to more directly unite Diophantine equations with the modern perspective that number theory is the study of the group G_{\mathbb{Q}} and its representations.

Namely, we saw that in the case of univariate polynomials that one could very easily associate a representation of G_{\mathbb{Q}} that, in some sense, ‘tells all’. One of the reasons we then sought the refuge of the geometry underpinning ‘higher-dimensional Diophantine equations’ was that such a technique no longer proved to be possible. One may then wonder if one can actually exploit this geometry to remedy this situationif we can use the geometry somehow to associate to Diophantine equations (of arbitrary dimensions) representations of G_{\mathbb{Q}} which, like in the univarate case, give us insights into the solutions to the Diophantine equations.

So, the main impediment to extending what happened in the univariate case to higher dimensions was the lack of a natural finite-dimensional vector space attached to the Diophantine equation for G_{\mathbb{Q}} to act on. Namely, we ostensibly made \rho_f, for f univariate, by creating a vector space in the dumbest possible way: using the tautological representation of S_N. How might we analogize this to higher dimensions?

The key observation is that, in fact, one can think of the vector space \mathbb{C}^N showing up in the representation \rho_f as actually coming from the geometry of X_f. Namely, we saw that X_f(\overline{\mathbb{Q}})=X_f(\mathbb{C}) is nothing more than a set of discrete points with a continuous action of G_{\mathbb{Q}}. Moreover, we can see that the representation \rho_f essentially came from how G_{\mathbb{Q}} permuted the connected components of this discrete space X_f(\mathbb{C}) and, in particular, \rho_f just acted on the free \mathbb{C} vector space on the connected components. In general, given a complex manifold M, there is a name for the free \mathbb{C} vector spaces on the set of connected components of M: the zeroth singular cohomology H^0_\text{sing}(M,\mathbb{C}). Indeed, one can note that since X_f(\mathbb{C}) is discrete that G_{\mathbb{Q}} acts continuously on X_f(\mathbb{C}) and thus, by the functoriality of singular cohomology, gives an induced linear action on H^0_\text{sing}(X_f(\mathbb{C}),\mathbb{C}). One can check that, as you’d hope, the G_{\mathbb{Q}}-representations \rho_f and H^0_{\text{sing}}(X_f(\mathbb{C}),\mathbb{C}) are isomorphic.

This gives us a clear-cut way to try and attack the goal of associating to higher-dimensional Diophantine equations representations of G_{\mathbb{Q}}. For example, if f is a homogenous polynomial, then P_f(\mathbb{C}) will be (in good situations) a compact complex manifold and thus, in particular, a well-behaved topological space. This allows us to associate to f various complex vector spaces: the singular cohomology groups H^i_\text{sing}(P_f(\mathbb{C}),\mathbb{C}) for i\geqslant 0. Of course, to make this be totally complete we will still need to define an action of G_{\mathbb{Q}} on this space. Therein lies the rub. Namely, we have two difficulties that make this a much harder problem than in the discrete case. The first is that it is no longer true that P_f(\overline{\mathbb{Q}})=P_f(\mathbb{C}) (take f(T_1,T_1)=T_1+T_2 for example). So, what we get is not an action of G_{\mathbb{Q}} on P_f(\mathbb{C}) but an action of \text{Aut}(\mathbb{C}/\mathbb{Q}). Of course, this action factors through G_{\mathbb{Q}} on the subset P_f(\overline{\mathbb{Q}})\subseteq P_f(\mathbb{C}).

The second, and more serious issue, is that we want to consider these cohomology groups of P_f(\mathbb{C}) where we give P_f(\mathbb{C}) the complex topology, NOT the discrete one. That said, \text{Aut}(\mathbb{C}/\mathbb{Q}) only acts continuously on P_f(\mathbb{C}) with the discrete topology, and acts wildly discontinuously on P_f(\mathbb{C}) with the complex topology. In particular, since cohomology is only functorial for continuous maps, we seem completely doomed in utilizing the geometry to copy what happened in the univariate case for higher-dimensional examples.

In comes Grothendieck. It was Grothendieck’s brilliant genius to realize that one can fix the above, by understanding that the singular cohomology of P_f(\mathbb{C}) can, in some sense, be obtained by algebraic methods and, in particular, in a way that allows one to actually have G_{\mathbb{Q}} act on this singular cohomology. This is the modern masterpiece that is the theory of etale cohomology.

Let me briefly explain what this looks like. Grothendieck realize that to any scheme X (thought of as a ‘geometric space associated to a set of equations’think X_f or P_f) one can associate certain \overline{\mathbb{Q}_\ell} vector spacess, denoted H^i(X,\overline{\mathbb{Q}_\ell}). Moreover, what Grothendieck shows, and this is the pivotal part, is that these cohomology groups are functorial in the scheme. In particular, if X/\mathbb{Q} is a scheme obtained from a Diophantine equation then one has an operation, known as base change, which gives a scheme X_{\overline{\mathbb{Q}}}/\overline{\mathbb{Q}} (this is much like X_f(\overline{\mathbb{Q}}) and this has an action of G_{\mathbb{Q}} (much in the same way that X_f(\overline{\mathbb{Q}}) had an action of G_{\mathbb{Q}}) and thus, by Grothendieck’s wonderful machine, we obtain an action of G_\mathbb{Q} on H^i(X_{\overline{\mathbb{Q}}},\overline{\mathbb{Q}_\ell}) which is, in fact, continuous if one gives the cohomology group the natural topology of an \overline{\mathbb{Q}_\ell} vector space.

What does this have anything to do with the geometry of X(\mathbb{C})? Well, Grothendieck and Artin showed that one has a natural isomorphism H^i(X_{\overline{\mathbb{Q}}},\overline{\mathbb{Q}_\ell})\cong H^i_\text{sing}(X(\mathbb{C}),\overline{\mathbb{Q}_\ell}) which, since i:\overline{\mathbb{Q}_\ell}\cong \mathbb{C}, implies that (at least after choosing an isomorphism i) that H^i(X_{\overline{\mathbb{Q}}},\overline{\mathbb{Q}_\ell})\cong H^i_\text{sing}(X(\mathbb{C}),\mathbb{C}). Thus, by Grothendieck’s wonderful machinery, and the beautiful result of Artin, one can define an action of G_{\mathbb{Q}} on H^i_\text{sing}(X(\mathbb{C}),\mathbb{C}), but even better, a continuous action of G_{\mathbb{Q}} on H^i_\text{sing}(X(\mathbb{C}),\overline{\mathbb{Q}_\ell}).

Thus, utilizing the geometry of the Diophantine equation and its arithmetic properties (i.e. viewing it through the lens of arithmetic geometry) one can associate to it a (continuous) representation of G_{\mathbb{Q}} and, as promised, this enhances the study of the Diophantine equation. To elaborate on this would take many, many hours but suffice it to say that the representation contains within it much of the information of the Diophantine equation (e.g. the number of solutions of the equation modulo p). As an example, the fact that P_{a,b,c}(\mathbb{F}_p)=p+1 is a direct consequence (with the general theory at hand) of the fact that H^1_\text{sing}(P_{a,b,c}(\mathbb{C}),\mathbb{C})=0.

Moral of the story: learn schemes

So, let’s summarize all of what happened above. If one wants to study a Diophantine equation, there are many methods of attack for specific types of equations, but no general approach. This is undesirable. In particular, one would like to develop a method to study such Diophantine equations in a systematic way that ties into the modern number theoretic goal of studying G_{\mathbb{Q}} and its representations.

One achieves this systematic approach by trading an unstructured set of solutions (e.g. X_f(\mathbb{Q}) or P_f(\mathbb{Q})) for a G_{\mathbb{Q}}-set and a geometric object (e.g. P_f(\overline{\mathbb{Q}}) and P_f(\mathbb{C})) and studying the Diophantine equation between the study of these two separately (number theory and algebraic geometry) and their interaction (arithmetic geometry).

We saw a key example of this in trying to understand quadratic Diophantine equations. We were able to, in Legendre’s theorem, give a fairly satisfactory description of their structure. But, to do so, required that we view such Diophantine equations both as a geometric object (namely \mathbb{CP}^1) and an object of number theory (a class in the Brauer group \text{Br}(\mathbb{Q})). Only by combining these approaches were we truly able to give our desired description.

Moreover, in the end, we indicated how this general arithemetico-geometric approach to Diophantine equations ties into the other modern perspective on number theory (the study of continuous representations of G_{\mathbb{Q}}) by explaining how the geometry allows us, thanks to the great work of Grothendieck (and many others, including Artin), associate to any Diophantine equation such a representation which, as hinted at, contains an immense amount of information about the Diophantine equation. Thus, the modern study of number theory (the study of Galois representations) does insert directly into the classical desire to study Diophantine equations.

Finally, it’s worth mentioning that all of the above is very neatly organized in modern algebraic geometry. Namely, it was the vision of Grothendieck and his collaborators to be able to neatly package the idea that from equations one can obtain a geometric and arithmetic object. Namely, they defined the notion of schemes which, in essence, captures both the arithmetic properties of the G_{\mathbb{Q}}-set and the geometric properties of the topological (complex analytic) space associated to a Diophantine equation. So, as is usually a good way to end a talk, I implore you to go out and learn scheme theory. Not for its fanciness, but because of its natural necessity in the study of number theory and, in particular, in the study of Diophantine equations.

10 comments

  1. Regarding your remark “choose an isomorphism $\overline{\mathbb{Q}_\ell}\cong\mathbb{C}$:” I am out of my depth here, but I thought you had to complete $\overline{\mathbb{Q}_\ell$ before obtaining a field isomorphic to $\mathbb{C}$. Am I misremembering facts about the p-adics, or was this implicit in the notation, or something like that?

    1. Hey Arun,

      It’s actually true that all uncountable algebraically closed fields of characterstic 0 are (abstractly) isomorphic. So, \mathbb{C}\cong\overline{\mathbb{Q}_\ell}\cong\mathbb{C}_\ell (this later is the completion of \overline{\mathbb{Q}_\ell}, as you indicated, that you need for a complete algebraically closed extension of \mathbb{Q}_\ell).

      Hope this helps!

      1. Hi Alex,

        I don’t mean to be a pesky pedant who annoyingly swoops in from the internet, but doesn’t one also require that the two algebraically closed fields of characteristic zero have the same cardinality? (This is certainly the case for $ \mathbb{C} $, $ \overline{\mathbb{Q}_\ell} $, and $ \mathbb{C}_\ell $, but it’s not true for, say, $ \overline{\mathbb{Q}(T_i)_{i\in\mathbb{R}}} $ and $ \overline{\mathbb{Q}(T_i)_{i\in 2^{\mathbb{R}}}} $).

        Best,

        oregontrailmixtape

  2. Sorry, one non-article related question: Have you got an RSS feed for this blog?
    I can’t find any but I just want to make sure as I’d really like to subscribe to that!

Leave a reply to Arun Debray Cancel reply