# Proofs of Fermat's theorem on sums of two squares

Fermat's theorem on sums of two squares asserts that an odd prime number p can be expressed as

${\displaystyle p=x^{2}+y^{2}}$

with integer x and y if and only if p is congruent to 1 (mod 4). The statement was announced by Girard in 1625, and again by Fermat in 1640, but neither supplied a proof.

The "only if" clause is easy: a perfect square is congruent to 0 or 1 modulo 4, hence a sum of two squares is congruent to 0, 1, or 2. An odd prime number is congruent to either 1 or 3 modulo 4, and the second possibility has just been ruled out. The first proof that such a representation exists was given by Leonhard Euler in 1747 and was complicated. Since then, many different proofs have been found. Among them, the proof using Minkowski's theorem about convex sets[1] and Don Zagier's short proof based on involutions have appeared.

## Euler's proof by infinite descent

Euler succeeded in proving Fermat's theorem on sums of two squares in 1749, when he was forty-two years old. He communicated this in a letter to Goldbach dated 12 April 1749.[2] The proof relies on infinite descent, and is only briefly sketched in the letter. The full proof consists in five steps and is published in two papers. The first four steps are Propositions 1 to 4 of the first paper[3] and do not correspond exactly to the four steps below. The fifth step below is from the second paper.[4] [5]

1. The product of two numbers, each of which is a sum of two squares, is itself a sum of two squares.

This is a well known property, based on the identity
${\displaystyle (a^{2}+b^{2})(p^{2}+q^{2})=(ap+bq)^{2}+(aq-bp)^{2}}$
due to Diophantus.

2. If a number which is a sum of two squares is divisible by a prime which is a sum of two squares, then the quotient is a sum of two squares. (This is Euler's first Proposition).

Indeed, suppose for example that ${\displaystyle a^{2}+b^{2}}$ is divisible by ${\displaystyle p^{2}+q^{2}}$ and that this latter is a prime. Then ${\displaystyle p^{2}+q^{2}}$ divides
${\displaystyle (pb-aq)(pb+aq)=p^{2}b^{2}-a^{2}q^{2}=p^{2}(a^{2}+b^{2})-a^{2}(p^{2}+q^{2}).}$
Since ${\displaystyle p^{2}+q^{2}}$ is a prime, it divides one of the two factors. Suppose that it divides ${\displaystyle pb-aq}$. Since
${\displaystyle (a^{2}+b^{2})(p^{2}+q^{2})=(ap+bq)^{2}+(aq-bp)^{2}}$
(Diophantus's identity) it follows that ${\displaystyle p^{2}+q^{2}}$ must divide ${\displaystyle (ap+bq)^{2}}$. So the equation can be divided by the square of ${\displaystyle p^{2}+q^{2}}$. Dividing the expression by ${\displaystyle (p^{2}+q^{2})^{2}}$ yields:
${\displaystyle {\frac {a^{2}+b^{2}}{p^{2}+q^{2}}}=\left({\frac {ap+bq}{p^{2}+q^{2}}}\right)^{2}+\left({\frac {aq-bp}{p^{2}+q^{2}}}\right)^{2}}$
and thus expresses the quotient as a sum of two squares, as claimed.
If ${\displaystyle p^{2}+q^{2}}$ divides ${\displaystyle pb+aq}$, a similar argument holds by using
${\displaystyle (a^{2}+b^{2})(q^{2}+p^{2})=(aq+bp)^{2}+(ap-bq)^{2}}$
(Diophantus's identity).

3. If a number which can be written as a sum of two squares is divisible by a number which is not a sum of two squares, then the quotient has a factor which is not a sum of two squares. (This is Euler's second Proposition).

Suppose ${\displaystyle x}$ divides ${\displaystyle a^{2}+b^{2}}$ and that the quotient, factored into its prime factors is ${\displaystyle p_{1}p_{2}\cdots p_{n}}$. Then ${\displaystyle a^{2}+b^{2}=xp_{1}p_{2}\cdots p_{n}}$. If all factors ${\displaystyle p_{i}}$ can be written as sums of two squares, then we can divide ${\displaystyle a^{2}+b^{2}}$ successively by ${\displaystyle p_{1}}$, ${\displaystyle p_{2}}$, etc., and applying the previous step we deduce that each quotient is a sum of two squares. This until we get to ${\displaystyle x}$, concluding that ${\displaystyle x}$ would have to be the sum of two squares. So, by contraposition, if ${\displaystyle x}$ is not the sum of two squares, then at least one of the primes ${\displaystyle p_{i}}$ is not the sum of two squares.

4. If ${\displaystyle a}$ and ${\displaystyle b}$ are relatively prime then every factor of ${\displaystyle a^{2}+b^{2}}$ is a sum of two squares. (This is Euler's Proposition 4. The proof sketched below includes the proof of his Proposition 3).

This is the step that uses infinite descent. Let ${\displaystyle x}$ be a factor of ${\displaystyle a^{2}+b^{2}}$. We can write
${\displaystyle a=mx\pm c,\qquad b=nx\pm d}$
where ${\displaystyle c}$ and ${\displaystyle d}$ are at most half of ${\displaystyle x}$ in absolute value. This gives:
${\displaystyle a^{2}+b^{2}=m^{2}x^{2}\pm 2mxc+c^{2}+n^{2}x^{2}\pm 2nxd+d^{2}=Ax+(c^{2}+d^{2}).}$
Therefore, ${\displaystyle c^{2}+d^{2}}$ must be divisible by ${\displaystyle x}$, say ${\displaystyle c^{2}+d^{2}=yx}$. If ${\displaystyle c}$ and ${\displaystyle d}$ are not relatively prime, then their gcd must be relatively prime to ${\displaystyle x}$ (else the common factor of their gcd and ${\displaystyle x}$ would also be a common factor of ${\displaystyle a}$ and ${\displaystyle b}$ which we assume are relatively prime). Thus the square of the gcd divides ${\displaystyle y}$ (as it divides ${\displaystyle c^{2}+d^{2}}$), giving us an expression of the form ${\displaystyle e^{2}+f^{2}=zx}$ for relatively prime ${\displaystyle e}$ and ${\displaystyle f}$, and with ${\displaystyle z}$ no more than half of ${\displaystyle x}$, since
${\displaystyle zx=e^{2}+f^{2}\leq c^{2}+d^{2}\leq \left({\frac {x}{2}}\right)^{2}+\left({\frac {x}{2}}\right)^{2}={\frac {1}{2}}x^{2}.}$
If ${\displaystyle c}$ and ${\displaystyle d}$ are relatively prime, then we can use them directly instead of switching to ${\displaystyle e}$ and ${\displaystyle f}$.
If ${\displaystyle x}$ is not the sum of two squares, then by the third step there must be a factor of ${\displaystyle z}$ which is not the sum of two squares; call it ${\displaystyle w}$. This gives an infinite descent, going from ${\displaystyle x}$ to a smaller number ${\displaystyle w}$, both not the sums of two squares but dividing a sum of two squares. Since an infinite descent is impossible, we conclude that ${\displaystyle x}$ must be expressible as a sum of two squares, as claimed.

5. Every prime of the form ${\displaystyle 4n+1}$ is a sum of two squares. (This is the main result of Euler's second paper).

If ${\displaystyle p=4n+1}$, then by Fermat's Little Theorem each of the numbers ${\displaystyle 1,2^{4n},3^{4n},\dots ,(4n)^{4n}}$ is congruent to one modulo ${\displaystyle p}$. The differences ${\displaystyle 2^{4n}-1,3^{4n}-2^{4n},\dots ,(4n)^{4n}-(4n-1)^{4n}}$ are therefore all divisible by ${\displaystyle p}$. Each of these differences can be factored as
${\displaystyle a^{4n}-b^{4n}=\left(a^{2n}+b^{2n}\right)\left(a^{2n}-b^{2n}\right).}$
Since ${\displaystyle p}$ is prime, it must divide one of the two factors. If in any of the ${\displaystyle 4n-1}$ cases it divides the first factor, then by the previous step we conclude that ${\displaystyle p}$ is itself a sum of two squares (since ${\displaystyle a}$ and ${\displaystyle b}$ differ by ${\displaystyle 1}$, they are relatively prime). So it is enough to show that ${\displaystyle p}$ cannot always divide the second factor. If it divides all ${\displaystyle 4n-1}$ differences ${\displaystyle 2^{2n}-1,3^{2n}-2^{2n},\dots ,(4n)^{2n}-(4n-1)^{2n}}$, then it would divide all ${\displaystyle 4n-2}$ differences of successive terms, all ${\displaystyle 4n-3}$ differences of the differences, and so forth. Since the ${\displaystyle k}$th differences of the sequence ${\displaystyle 1^{k},2^{k},3^{k},\dots }$ are all equal to ${\displaystyle k!}$ (Finite difference), the ${\displaystyle 2n}$th differences would all be constant and equal to ${\displaystyle (2n)!}$, which is certainly not divisible by ${\displaystyle p}$. Therefore, ${\displaystyle p}$ cannot divide all the second factors which proves that ${\displaystyle p}$ is indeed the sum of two squares.

## Lagrange's proof through quadratic forms

Lagrange completed a proof in 1775[6] based on his general theory of integral quadratic forms. The following presentation incorporates a slight simplification of his argument, due to Gauss, which appears in article 182 of the Disquisitiones Arithmeticae.

An (integral binary) quadratic form is an expression of the form ${\displaystyle ax^{2}+bxy+cy^{2}}$ with ${\displaystyle a,b,c}$ integers. A number ${\displaystyle n}$ is said to be represented by the form if there exist integers ${\displaystyle x,y}$ such that ${\displaystyle n=ax^{2}+bxy+cy^{2}}$. Fermat's theorem on sums of two squares is then equivalent to the statement that a prime ${\displaystyle p}$ is represented by the form ${\displaystyle x^{2}+y^{2}}$ (i.e., ${\displaystyle a=c=1}$, ${\displaystyle b=0}$) exactly when ${\displaystyle p}$ is congruent to ${\displaystyle 1}$ modulo ${\displaystyle 4}$.

The discriminant of the quadratic form is defined to be ${\displaystyle b^{2}-4ac}$. The discriminant of ${\displaystyle x^{2}+y^{2}}$ is then equal to ${\displaystyle -4}$.

Two forms ${\displaystyle ax^{2}+bxy+cy^{2}}$ and ${\displaystyle a'x'^{2}+b'x'y'+c'y'^{2}}$ are equivalent if and only if there exist substitutions with integer coefficients

${\displaystyle x=\alpha x'+\beta y'}$
${\displaystyle y=\gamma x'+\delta y'}$

with ${\displaystyle \alpha \delta -\beta \gamma =\pm 1}$ such that, when substituted into the first form, yield the second. Equivalent forms are readily seen to have the same discriminant, and hence also the same parity for the middle coefficient ${\displaystyle b}$, which coincides with the parity of the discriminant. Moreover, it is clear that equivalent forms will represent exactly the same integers, because these kind of substitutions can be reversed by substitutions of the same kind.

Lagrange proved that all positive definite forms of discriminant −4 are equivalent. Thus, to prove Fermat's theorem it is enough to find any positive definite form of discriminant −4 that represents ${\displaystyle p}$. For example, one can use a form

${\displaystyle px^{2}+2mxy+\left({\frac {m^{2}+1}{p}}\right)y^{2},}$

where the first coefficient a = p was chosen so that the form represents p by setting x = 1, and y = 0, the coefficient b = 2m is an arbitrary even number (as it must be, to get an even discriminant), and finally ${\displaystyle c={\frac {m^{2}+1}{p}}}$ is chosen so that the discriminant ${\displaystyle b^{2}-4ac=4m^{2}-4pc}$ is equal to −4, which guarantees that the form is indeed equivalent to ${\displaystyle x^{2}+y^{2}}$. Of course, the coefficient ${\displaystyle c={\frac {m^{2}+1}{p}}}$ must be an integer, so the problem is reduced to finding some integer m such that p divides ${\displaystyle m^{2}+1}$. This is possible by Euler's criterion, but we reproduce the argument below to finish the proof.

As said, it suffices to find a root m of the polynomial ${\displaystyle P(x)=x^{2}+1}$ modulo p = 4n+1. What we do know, by Fermat's Little Theorem, is that each z not congruent to 0 modulo p is a root of the polynomial ${\displaystyle Q(z)=z^{p-1}-1=z^{4n}-1=(z^{2n}-1)(z^{2n}+1)}$. Then it must be a root of either ${\displaystyle z^{2n}-1}$ or ${\displaystyle z^{2n}+1}$, since the integers modulo p form a field. Moreover, by a theorem of Lagrange, the number of roots modulo p of a polynomial of degree d is at most d (this follows again since the integers modulo p form a field). So the 4n nonzero classes 1, 2, …, p − 1 must split into exactly 2n of them that are roots of ${\displaystyle z^{2n}-1}$, and the other 2n that are roots of ${\displaystyle z^{2n}+1}$. Choosing any z of the second kind and setting ${\displaystyle m=z^{n}}$ completes the proof.

## Dedekind's two proofs using Gaussian integers

Richard Dedekind gave at least two proofs of Fermat's theorem on sums of two squares, both using the arithmetical properties of the Gaussian integers, which are numbers of the form a + bi, where a and b are integers, and i is the square root of −1. One appears in section 27 of his exposition of ideals published in 1877; the second appeared in Supplement XI to Peter Gustav Lejeune Dirichlet's Vorlesungen über Zahlentheorie, and was published in 1894.

1. First proof. If ${\displaystyle p}$ is an odd prime number, then we have ${\displaystyle i^{p-1}=(-1)^{\frac {p-1}{2}}}$ in the Gaussian integers. Consequently, writing a Gaussian integer ω = x + iy with x,y ∈ Z and applying the Frobenius automorphism in Z[i]/(p), one finds

${\displaystyle \omega ^{p}=(x+yi)^{p}\equiv x^{p}+y^{p}i^{p}\equiv x+(-1)^{\frac {p-1}{2}}yi{\pmod {p}},}$

since the automorphism fixes the elements of Z/(p). In the current case, ${\displaystyle p=4n+1}$ for some integer n, and so in the above expression for ωp, the exponent (p-1)/2 of -1 is even. Hence the right hand side equals ω, so in this case the Frobenius endomorphism of Z[i]/(p) is the identity. Kummer had already established that if f ∈ {1,2} is the order of the Frobenius automorphism of Z[i]/(p), then the ideal ${\displaystyle (p)}$ in Z[i] would be a product of 2/f distinct prime ideals. (In fact, Kummer had established a much more general result for any extension of Z obtained by adjoining a primitive m-th root of unity, where m was any positive integer; this is the case m = 4 of that result.) Therefore, the ideal (p) is the product of two different prime ideals in Z[i]. Since the Gaussian integers are a Euclidean domain for the norm function ${\displaystyle N(x+iy)=x^{2}+y^{2}}$, every ideal is principal and generated by a nonzero element of the ideal of minimal norm. Since the norm is multiplicative, the norm of a generator ${\displaystyle \alpha }$ of one of the ideal factors of (p) must be a strict divisor of ${\displaystyle N(p)=p^{2}}$, so that we must have ${\displaystyle p=N(\alpha )=N(a+bi)=a^{2}+b^{2}}$, which gives Fermat's theorem.

2. Second proof. This proof builds on Lagrange's result that if ${\displaystyle p=4n+1}$ is a prime number, then there must be an integer m such that ${\displaystyle m^{2}+1}$ is divisible by p (we can also see this by Euler's criterion); it also uses the fact that the Gaussian integers are a unique factorization domain (because they are a Euclidean domain). Since pZ does not divide either of the Gaussian integers ${\displaystyle m+i}$ and ${\displaystyle m-i}$ (as it does not divide their imaginary parts), but it does divide their product ${\displaystyle m^{2}+1}$, it follows that ${\displaystyle p}$ cannot be a prime element in the Gaussian integers. We must therefore have a nontrivial factorization of p in the Gaussian integers, which in view of the norm can have only two factors (since the norm is multiplicative, and ${\displaystyle p^{2}=N(p)}$, there can only be up to two factors of p), so it must be of the form ${\displaystyle p=(x+yi)(x-yi)}$ for some integers ${\displaystyle x}$ and ${\displaystyle y}$. This immediately yields that ${\displaystyle p=x^{2}+y^{2}}$.

## Proof by Minkowski's Theorem

For ${\displaystyle p}$ congruent to ${\displaystyle 1}$ mod ${\displaystyle 4}$ a prime, ${\displaystyle -1}$ is a quadratic residue mod ${\displaystyle p}$ by Euler's criterion. Therefore, there exists an integer ${\displaystyle m}$ such that ${\displaystyle p}$ divides ${\displaystyle m^{2}+1}$. Let ${\displaystyle {\vec {u}}={\hat {i}}+m{\hat {j}}}$ and ${\displaystyle {\vec {v}}=0{\hat {i}}+p{\hat {j}}}$. Consider the lattice ${\displaystyle S=\{a{\vec {u}}+b{\vec {v}}\mid a,b\in \mathbb {Z} \}}$. If ${\displaystyle {\vec {w}}=a{\vec {u}}+b{\vec {v}}=a{\hat {i}}+(am+bp){\hat {j}}\in S}$ then ${\displaystyle \|{\vec {w}}\|^{2}\equiv a^{2}+(am+bp)^{2}\equiv a^{2}(1+m^{2})\equiv 0{\pmod {p}}}$. Thus ${\displaystyle p}$ divides ${\displaystyle \|{\vec {w}}\|^{2}}$ for any ${\displaystyle {\vec {w}}\in S}$.

The area of the fundamental parallelogram of the lattice is ${\displaystyle p}$. The area of the open disk, ${\displaystyle D}$, of radius ${\displaystyle {\sqrt {2p}}}$ centered around the origin is ${\displaystyle 2\pi p>4p}$. Furthermore, ${\displaystyle D}$ is convex and symmetrical about the origin. Therefore, by Minkowski's theorem there exists a nonzero vector ${\displaystyle {\vec {w}}\in S}$ such that ${\displaystyle {\vec {w}}\in D}$. Both ${\displaystyle \|{\vec {w}}\|^{2}<2p}$ and ${\displaystyle p\mid \|{\vec {w}}\|^{2}}$ so ${\displaystyle p=\|{\vec {w}}\|^{2}}$. Hence ${\displaystyle p}$ is the sum of the squares of the components of ${\displaystyle {\vec {w}}}$.

## Zagier's "one-sentence proof"

If p = 4k + 1 is prime, then the set S = {(x, y, z) ∈ N3: x2 + 4yz = p} (here the set N of all natural numbers can be taken to include 0 or to exclude 0, and in both cases, x, y and z must be positive for any (x, y, z) ∈ S, as p is an odd prime) is finite and has two involutions: an obvious one (x, y, z) → (x, z, y), whose fixed points, (x, y, y), correspond to representations of p as a sum of two squares, and a more complicated one,

${\displaystyle (x,y,z)\mapsto {\begin{cases}(x+2z,~z,~y-x-z),\quad {\textrm {if}}\,\,\,x2y\end{cases}}}$

which has exactly one fixed point, (1, 1, k). The cardinality of S has the same parity as the number of fixed points of an involution on that set. Thus, from the second involution we know that the cardinality of S is odd and therefore the number of fixed points for the first involution cannot be zero, proving the existence of fixed points for the first involution and consequently that p is a sum of two squares.

This proof, due to Zagier, is a simplification of an earlier proof by Heath-Brown, which in turn was inspired by a proof of Liouville. The technique of the proof is a combinatorial analogue of the topological principle that the Euler characteristics of a topological space with an involution and of its fixed point set have the same parity and is reminiscent of the use of sign-reversing involutions in the proofs of combinatorial bijections.

## Proof with partition theory

In 2016, A. David Christopher gave a partition-theoretic proof by considering partitions of the odd prime ${\displaystyle n}$ having exactly two sizes ${\displaystyle a_{i}(i=1,2)}$, each occurring exactly ${\displaystyle a_{i}}$ times, and by showing that at least one such partition exists if ${\displaystyle n}$ is congruent to 1 modulo 4.[7]

## References

• Richard Dedekind, The theory of algebraic integers.
• Harold M. Edwards, Fermat's Last Theorem. A genetic introduction to algebraic number theory. Graduate Texts in Mathematics no. 50, Springer-Verlag, NY, 1977.
• C. F. Gauss, Disquisitiones Arithmeticae (English Edition). Transl. by Arthur A. Clarke. Springer-Verlag, 1986.
• Goldman, Jay R. (1998), The Queen of Mathematics: A historically motivated guide to Number Theory, A K Peters, ISBN 1-56881-006-7
• D. R. Heath-Brown, Fermat's two squares theorem. Invariant, 11 (1984) pp. 3–5.
• John Stillwell, Introduction to Theory of Algebraic Integers by Richard Dedekind. Cambridge Mathematical Library, Cambridge University Press, 1996.
• Don Zagier, A one-sentence proof that every prime p ≡ 1 mod 4 is a sum of two squares. Amer. Math. Monthly 97 (1990), no. 2, 144, doi:10.2307/2323918

## Notes

1. ^ See Goldman's book, §22.5
2. ^ Euler à Goldbach, lettre CXXV
3. ^ De numerus qui sunt aggregata duorum quadratorum. (Novi commentarii academiae scientiarum Petropolitanae 4 (1752/3), 1758, 3-40) [1]
4. ^ Demonstratio theorematis FERMATIANI omnem numerum primum formae 4n+1 esse summam duorum quadratorum. (Novi commentarii academiae scientiarum Petropolitanae 5 (1754/5), 1760, 3-13) [2]
5. ^ The summary is taken from Edwards book, pages 45-48; italics in the original.
6. ^ Nouv. Mém. Acad. Berlin, année 1771, 125; ibid. année 1773, 275; ibid année 1775, 351.
7. ^ A. David Christopher, A partition-theoretic proof of Fermat’s Two Squares Theorem”, Discrete Mathematics, 339 (2016) 1410–1411.