Pythagorean Triples

When you talk to the average human about mathematics with a very high probability you can expect a discussion about the Pythagorean theorem. If you’re unaware of this theorem not only am I very shocked but part of me wants to tell you’re in the wrong part of the Internet. Jokes aside this post isn’t about the theorem but is instead about looking for solutions of the Pythagorean equation that are integers. For completeness the so-called Pythagorean equation is X^2 + Y^2 = Z^2 .

We are interested in solutions of this equation in the integers and due to the symmetry of the equation it suffices to consider only non-negative integers. A solution (x,y,z) \in \mathbb{Z}^3 is called a Pythagorean triple and given such a triple, (ax,ay,az ) is also a Pythagorean triple for any a \in \mathbb{Z}.

As we want to understand the integer solutions of the Pythagorean equation X^2 + Y^2 = Z^2 it makes sense to consider triples (x,y,x) where the components are pair-wise coprime and non-negative. This follows from the above observations and we call such solutions primitive.

Pythagoras himself constructed an infinite number of Pythagorean triples by using the identity (2n+1)^2 + (2n^2+2n)^2 = (2n^2+2n+1)^2 . It is worth verifying this identity and finding a triple that isn’t in its form.

In fact this problem is very classical, some may say ancient, as in the third century B.C. Euclid solved the problem of finding all such integer solutions.

Theorem: let a,b \in \mathbb{Z}^+ such that \gcd(a,b) = 1, one is even and the other is odd and a > b, then the triple (x,y,z) \in \mathbb{Z}^3 given by x = a^2 - b^2, y = 2ab and z = a^2 + b^2 is a primitive solution of the Pythagorean equation. Moreover, each primitive solution is of this form.

Proof of Theorem: Let a,b \in \mathbb{Z}^+ be such as in our hypothesis along with the triple (x,y,z). It is easy to verify that this triple is indeed Pythagorean by simply computing x^2 +y^2 = (a^2 -b^2)^2 +(2ab)^2 = a^4 +2a^2b^2 +b^4 = (a^2 +b^2) = z^2 . Clearly x,y,z are positive integers so we need to show they are pairwise coprime to deduce the triple is primitive. If d = \gcd(x,y,z) then d divides both x + z = 2a^2 and z - x = 2b^2 by the linear property of division. As we assumed that a,b are coprime we must have either d = 1 or d = 2. In the second case, if d=2 , we have a contradiction due to say a being even and b being odd. Bringing this together we have shown (x,y,z) is a primitive Pythagorean triple.

To show each primitive solution is of the form in our hypothesis we let (x,y,z) be a primitive Pythagorean triple. As they are all pairwise coprime and satisfy x^2 + y^2 = z^2 we can conclude that y is even and x,z are odd. I’m leaving this for you to verify and as a hint you might want to look mod 8 .

Using the fact we have a pythagorean triple we deduce the identity (\frac{y}{2})^2 = \frac{1}{4}(x^2-z^2) = \frac{1}{2}(x-z)\frac{1}{2}(x+z). Due to the parity of x and z we observe \frac{1}{2}(z+x) and \frac{1}{2}(z-x) are both integers. We can in fact conclude they are coprime due the coprime conditions of x,y and z . Looking at our identity the factors on the right have no common divisors so they must both be squares. In particular we can find a,b \in \mathbb{Z}^+ such that a^2 = \frac{1}{2}(z+x) and b^2 = \frac{1}{2}(z-x). It follows that \gcd(a,b) =1 due to \gcd(\frac{1}{2}(z+x),\frac{1}{2}(z-x)) =1 .

Its not hard to notice that a^2 - b^2 = x, 2ab = y and a^2 + b^2 = z hence giving us our Pythagorean triple. The last task is to show that a,b are of different parity. We compute a +b \equiv a^2 +b^2 \equiv 1 ( mod 2 ) showing us a +b is odd and by the way parity changes under addition we know a and b have different parity. This means we’ve constructed the parametrization a,b for our triple (x,y,z) as described in the statement of the theorem.

One Topology to rule all normed spaces of a fixed dimension

A normed space is a vector space V over a field \mathbb{F} equipped with a so called ‘norm’ which is a function \lVert . \rVert : V \longrightarrow \mathbb{R} that satisfies a list of axioms. The intuition behind such a function is to give each elements in V a ‘size’. This is a strong piece of information and allows one to vision V as a metric space and hence induce a topology onto V. I am going to assume you’ve seen these concepts before and have knowledge of continuous functions in particular their topological definition and how they act on compact sets.

Looking at the sequence normed space \longrightarrow metric space \longrightarrow topology one can ask the very natural questions: what conditions on V and \lVert . \rVert give the same topology? What normed spaces don’t have the same topology? etc etc etc. The list goes on.

As the name of this post suggests we are going to show that when our underlying vector space V is of a fixed finite dimension (we are going to assume V is over \mathbb{R} or \mathbb{C} ) then there is only one topology that is induced. In some sense this means we know what notion of neighborhoods to expect if we’re in the finite world playing over the most ‘natural’ field. There is only one such notion.

In a more precise fashion what we are going to show is that any two norms on a finite dimensional space V over \mathbb{R} or \mathbb{C} determine the same topology.

For convenience we will let \mathbb{F} be our background field which will either be the real numbers or the complex numbers. Suppose V has dimension n with basis e_1,...,e_n. This means every element in V has a unique representation of the form \sum_{i=1}^{n}\lambda_i e_i where \lambda_i \in \mathbb{F} for all i . To get our result we shall define the ‘special’ Euclidean norm on V as \lVert x\rVert_{E} := (\sum_{i=1}^{n}|\lambda_i|^2)^{\frac{1}{2}}. One can check this is a norm. We also let \lVert.\rVert : V \longrightarrow \mathbb{R} be an arbitrary norm on V.

How do we show these norms induce the same topology? We find constants 0 \neq M, m \in \mathbb{R} such that m \lVert x \rVert_{E} \leq \lVert x \rVert \leq M \lVert x\rVert_{E} for all x \in V. The reason why this is sufficient is because in terms of the induced topologies an open ball in one topology will contain an open ball of the other and vice versa. If you’re not convinced take a moment to try to see why(It’s very instructive in getting familiar with the theory).

Let us start by determining M. Let x \in V by using the unique representation in terms of our basis and the axioms of a norm we can estimate the norm like so \lVert x \rVert = \lVert \sum_{i=1}^{n} \lambda_i e_i \rVert \leq \sum_{i=1}^{n} \lVert \lambda_i e_i \rVert = \sum_{i=1}^{n} |\lambda_i| \lVert x\rVert . Using Cauchy-Schwartz we can estimate even further and deduce \lVert x \rVert \leq (\sum_{i=1}^{n}|\lambda_i|^2)^\frac{1}{2} (\sum_{i=1}^{n}| \lVert e_i \rVert^2)^\frac{1}{2} hence we have \lVert x \rVert \leq M \lVert x \rVert_{E} where M = (\sum_{i=1}^{n} \lVert e_i \rVert^2)^\frac{1}{2} > 0 .

That was easy enough, right? If you’ve been doing mathematics long enough you notice the majority of mathematical proofs have an easy direction and a hard direction. That was the easy direction. Determining m is more difficult, it isn’t insanely hard, and relies on the Heine-Borel Theorem and some properties of continuous functions.

Let us first define a function f_{\lVert . \rVert} : \mathbb{F}^n \longrightarrow \mathbb{R} by f(\lambda_1,...,\lambda_n) := \lVert \sum_{i=1}^{n} \lambda_i e_i \rVert . I mentioned continuity and so it’s not surprising that f_{\lVert . \rVert}   is continuous with respect to the standard topology on \mathbb{F}^n.

To show f_{\lVert . \rVert} is continuous we exploit the triangle inequality of the norm \lVert . \rVert . Let \epsilon > 0 and pick \delta = \epsilon and (\alpha_1,...,\alpha_n), (\beta_1,...,\beta_n) \in \mathbb{F}^n be such that (\sum_{i =1}^n |\alpha_i - \beta_i|^2)^{1/2} < \delta and let H := \max_{1 \leq i \leq n} (\lVert e_i \rVert ) . We estimate | \; \lVert \sum_{i=1}^n \alpha_i e_i \rVert - \lVert \sum_{i=1}^n \beta_i e_i \rVert \; | \leq \lVert  \sum_{i=1}^n (\alpha_i - \beta_i) e_i \rVert \leq \sum_{i=1}^n |\alpha_i - \beta_i | \lVert e_i \rVert  \leq H \sum_{i=1}^n |\alpha_i - \beta_i | \leq n^{\frac{1}{2}} H(\sum_{i =1}^n |\alpha_i - \beta_i|^2)^{1/2} \leq  n^{\frac{1}{2}} H\epsilon with the second to last inequality coming from Cauchy-Schwartz. Hence f_{\lVert . \rVert } is continuous.

We also define the set U := \{\lambda_1,...,\lambda_n) \mathbb{F}^n : \sum_{i=1}^{n} |\lambda_i|^2 = 1 \} which is the unit-circle in \mathbb{F}^n . It follows that U is compact with respect to the standard norm on \mathbb{F}^n . To show this we use the the Heine-Borel Theorem which in its statement means it is necessary and sufficient to show that U is closed and bounded as we are working over \mathbb{F}^n .

The fact U is bounded follows straight from the definition so we only need to show that U is closed. This is the part where f_{\lVert .\rVert} comes in as one can check that U = f_{\lVert . \rVert_{E} }^{-1}(\{1\}) where f_{\lVert . \rVert_{E} }^{-1} is the pre-image of f_{\lVert . \rVert_{E} } . As \{1\} is closed in \mathbb{R} and f_{\lVert . \rVert_{E} } is continuous we have that U is closed and hence it is compact by the Heine-Borel Theorem.

We’ve done the set-up so now let us get onto proving our actual result. To do this we shall restrict our function f_{\lVert . \rVert } onto the set U and ask ourself what the value m := \inf_{x \in U} (f_{\lVert . \rVert }(x) ) is? As f_{\lVert .\rVert } is continuous and U is compact we can use a property of continuous functions to deduce that there exists an \alpha \in A such that m = f_{\lVert . \rVert }(\alpha) . Suppose \alpha = (\alpha_1,..,\alpha_n) and that m =0 then f(\alpha_1,...,\alpha_n) = \lVert \sum_{i=1}^{n} \alpha_i e_i \rVert = 0 and so \sum_{i=1}^{n} \alpha_i e_i = 0 by properties of the norm and by the linear independence of the basis we must have \alpha_i = 0 for all i \in \{1,...,n\}. This means \alpha = 0 which is a contradiction as 0 \notin U. Bringing this all together m > 0 and by the definition of m we have shown \lVert x \rVert \geq m > 0 when \lVert x \rVert_{E} = 1 .

All x \in V can be written in the form x = \lVert x \rVert_{E}y for some y \in V such that  \lVert y \rVert_{E} = 1 or equivalently where y = \sum_{i=1}^{n}\lambda_i e_i where (\lambda_1,...,\lambda_n) \in U we can say \lVert x \rVert = \lVert x \rVert_{E} \lVert y \rVert  > m\lVert x \rVert_{E} where m \neq 0.

What we’ve shown is that every possible norm on a vector space of a specified dimension over \mathbb{R} or \mathbb{C} induces the same topology,  in other words, as the title suggests there is one topology to rule all normed spaces of a fixed dimension… over \mathbb{R} or \mathbb{C} .

Simple Bounds of Sum Sets

This post is a bit different to what has currently appeared on our blog and fits in the realm of arithmetic combinatorics. We’ll be looking at finite subsets A,B \subseteq \mathbb{Z} of the integers and their designated sum set A + B := \{a + b : a\in A, b\in B\} . Within this set-up we are going investigate the relationship between the relative sizes of the three set A,B and A+B.

What can we say? Well, our first observation is that the size of a finite subset of \mathbb{Z} is invariant under translation which in symbols is represented by |A+\{a\}| = |A| for some a \in \mathbb{Z} . This is rather intuitive and follows from the properties of + . In the most general setting by making no assumptions on the structure of A,B \subseteq \mathbb{Z} one obtains the relationship shown in the theorem bellow.

Theorem: Let A,B \subseteq \mathbb{Z} be finite subsets then we have |A| + |B| - 1 \leq |A+B| \leq |A||B|

Proof of Theorem: Starting with the upper bound we define the function \varphi : A \times B \longrightarrow \mathbb{Z} by \varphi(a,b) = a + b which allows us to think of the sum set as the image of \varphi. In symbols this is A + B = Im\varphi. As A and B are both finite we have Im\varphi is largest when \varphi is injective which gives us the upper bound |A+B| \leq |A||B| .

To show the lower bound we exploit the ordering of the integers by translating both sets A and B. let M = \max(A) and m= \min(B) and depending on the signs of M, m we translate A by \pm M and B by \pm m with the outputs A^* and B^* such that \max(A^*) = 0 = \min(B^*) . By our observation before we have |A| = |A^*| and |B| = |B^*| and it’s important to note we also have |A+B| = |A^* + B^*| . By the design of our translations we have 0 \in A^* \cap B^* which implies A^*\cup B^* \subseteq A^* + B^* and also as A^* consist of non-positive integers and B^* consists of non-negative numbers we have |A^*\cup B^*| =|A^*| + |B^*| - 1 and hence |A| + |B| - 1 \leq |A+B| .

When given an inequality the first thing to ask is when do we get equality and I challenge you to find a pair of finite sets that obtain equality in our theorem! Try to find families of sets with some structure and not sets like A = \{4,6\} and B = \{98,34\} (for the upper bound).



Sums of powers in Finite Fields

I like finite fields. This isn’t hard to see as two of my posts (post 1 & post 2) have been about their structure but sticking with this ‘theme’ I’ve forced upon myself lets investigate them some more! Unlike previous posts I’m just going to jump straight into the maths.

Theorem: We let K =\mathbb{F}_{p^q} be a finite field with p^q elements where p is prime and q \in \mathbb{Z}_{>0}. If we let u \in \mathbb{Z} the sum S(x^u) := \sum_{x \in K}x^u is equal to -1 if u \geq 1 and is divsible by p^q-1 or 0 otherwise.

Proof of Theorem: If we have u = 0 then S(x^u) = |K| = p^q = 0 as we have Char K = p . Now we consider when u \geq 1 and when p^q -1 divides u. Trivially 0^u = 0 and if x \in K \backslash \{0\} we have x^u = x^{k(p^q-1)} = 1 for some k \in \mathbb{Z}. This follows directly from Lagrange’s theorem applied to the multiplicative group K^* of units and that Char K = p . This gives us S(x^u) = (p^q - 1) = -1 by again using the characteristic of K.

The last case, when u is not divisible by p^q -1 , is not as direct as the others (hence the new paragraph) and requires knowledge of the structure of K^* the units of K. In fact K^* \cong \mathbb{Z}_{p^q - 1} which if you’re not familiar with is shown in ‘post 2’. This means that there exists an element y \in K^* such that y^u \neq 1. We can simply take y to be the generator of K^*. As we are in a field we can say S(x^u) = \sum_{x \in K}x^u  = \sum_{x \in K}y^ux^u = y^u S(x^u) so we have (1- y^u) S(x^u) = 0  and due to y^u \neq 1 we must conclude that S(x^u) = 0. We are done!

What can this theorem tell us? A quick study of the result gives us that if u \in \mathbb{Z} is odd and the prime p \neq 2 we must get S(x^u)= 0 as p^q -1 is always even so it cannot divide u. This is quite remarkable as the result implies there are huge amounts of cancellation in the sum which a prior has no reason to occur.



The Units of finite fields

Given a field K a unit is an element with a multiplicative inverse and the collection of all of them has a group structure induced from the field and is called the multiplicative group of units denoted K^*. All elements in a field, apart from 0, have a multiplicative inverse so we can say K^* = K \backslash \{0\} . When given a field, one can ask what the structure of the group of units is and in this post we are going to look at this question for finite fields.

Theorem: The multiplicative group of units for a finite field is cyclic more precisely, given p prime and n \in \mathbb{N} we have \mathbb{F}_{p^n}^* \cong \mathbb{Z}_{p^n-1} .

Before proving this we need to introduce a nice function known as Euler’s \varphi -Function which is defined as \varphi : \mathbb{N} \longrightarrow \mathbb{N} such that \varphi(n) = |\{ a \in \mathbb{N} : a \leq n, gcd(a,n) = 1 \} | where gcd(a,n) is the greatest common divisor of a and n. One way of thinking about the \varphi -function is that it counts the number of generators in a cyclic group or equivalently \varphi(n) = |\{a \in \mathbb{Z}_n : \mathbb{Z}_n = \langle a \rangle \}| thanks to the fact all cyclic groups are ‘integers’. This new interpretation follows from that for a \in \mathbb{Z}_n we have |a| = \frac{n}{gcd(a,n)} which is left for you to check.

Let us use this new perspective of the \varphi-function to establish some of its properties. One such property is that for any n \in \mathbb{N} we have n = \sum_{d | n }\varphi(d) . To show this we look at the cyclic group \mathbb{Z}_n and notice for each d | n there is a unique x \in \mathbb{Z}_n such that |x| = n (where |.| is the order of an element). This is a property of cyclic groups and is left for you to prove… That is if you don’t believe me. Let C_d be the collection of generators in the unique subgroup, \langle x \rangle , with d elements. Since all the elements of \mathbb{Z}_n are contained in a C_d for some d we have n =  |\mathbb{Z}_n| =\sum_{d |n}|C_d| = \sum_{d | n}\varphi(d) .

Now we have all the tools to prove our theorem. Lets get to it.

Proof of Theorem: We are looking at the finite field K = \mathbb{F}_{p^n} with p^n elements and we first notice that for d \in \mathbb{N} such that d \;| \;(p^n - 1)  = |K^*| the set A_d := \{x \in K^* :  x^d = 1 \} is such that |A_d| \leq d. This is because the equation x^d = 1 in K has at most d solutions due to the polynomial  x^d -1 having degree d. We shall define the set \overline{A}_d := \{x \in K^* :  |x| = d \} and look at its cardinality. If \overline{A}_d \neq \emptyset then for x \in \overline{A}_d we have \langle x\rangle \cong \mathbb{Z}_d and so by Lagrange’s Theorem \langle x \rangle \subseteq A_d but with the fact |A_d| \leq d = |\langle x \rangle| we get A_d = \langle x \rangle hence we must get |\overline{A}_d | = \varphi(d) . To summarise either the cardinality of \overline{A}_d is 0 or \varphi(d). Suppose that \overline{A}_d= \emptyset for some d using the properties of the \varphi-function we have p^n - 1 = \sum_{f | (p^n - 1)}|\overline{A}_f | = \sum_{f \neq d | (p^n -1) }\varphi(f) < p^n -1 which is a contradiction so we must always be in the second case. Taking d = p^n - 1 = |K^*| we get \overline{A}_{|K^*|} \neq \emptyset so a generator of K^* must exist so it is cyclic with p^n -1 elements and so K^* \cong \mathbb{Z}_{p^n - 1} due to the classification of cyclic groups.

Abelian Simple Groups

Simple groups can be thought of as the atoms of group theory and this analogy has motivated people to formulate the periodic table of finite groups which is based of one of the biggest results in mathematics, the classification of finite simple groups. The first proof of this is over 10000 pages and spread amongst hundreds of journal articles and books. In this post we will be talking about abelian simple groups which can be classified in a lot fewer words.

First, what is a simple group? A groups G is simple if it’s non-trivial G \neq \{e\} and the only normal subgroups of G are \{e\} and G itself. This brings us to the analogy of atoms as the intuition is that G can not be ‘split’ into further smaller groups N and G/N  where N \unlhd G . This can be made precise with the terminology of group extensions which wont be discussed further.

The result heavily depends upon the classification of cyclic groups. so what’s the result?

Theorem: An abelian group G is simple if and only if G \cong \mathbb{Z}_p for a prime p

Proof of Theorem: If we assume G \cong \mathbb{Z}_p for a prime p and let N \unlhd G be normal by Lagrange’s theorem |N| divides p and so it is either 1 or p as p is prime which means N must be trivial or G . \mathbb{Z}_p is abelian so G must be as well.

Going the other way if we assume that G is abelian and simple, let e \neq x \in G so \{e\} \neq \langle x \rangle \unlhd G . \langle x \rangle is the subgroup generated by x and it’s normal in G as every subgroup in an abelian group is normal by definition. As G is simple we must have \langle x \rangle = G and so G is cyclic. If G is infinite then \{e\} \neq \langle x^2 \rangle \vartriangleleft G so \langle x^2 \rangle is a non-trival, proper normal subgroup of G which can exist as G is simple so G must be finite. Suppose |G| = mn where m,n \in \mathbb{N}\backslash \{1\} then \{e\} \neq \langle x^m \rangle \vartriangleleft G so \langle x^m \rangle is a non-trivial proper normal subgroups of G which is a contradiction against the assumption G is simple so |G| = p for some prime p . Bringing things together we have G is cyclic of order p so by the classification theorem of cyclic groups G \cong \mathbb{Z}_p

This is obviously a lot shorter than 10000 pages which emphasises just how ‘nice’ abelian groups really are.

Cyclic Groups are the Integers in disguise

When given a group G a natural questions to ask is what do the subgroups look like? This leads very quickly to the question of when is a subset S \subset G a subgroup and if it isnt what is the smallest subgroup H which contains S , S \subset H \leq G ? The answer to the first question comes in the form of the subgroup test and the next isn’t really answered but leads to the notion of the subgroup generated by S .

It’s not hard to show that taking intersections over a family of subgroups of G gives again a new subgroup of G . This motivates the definition of the subgroup generated by our subset S \subset G which is usually denoted by \langle S\rangle . Let X = \{H \in \mathscr{P}(G) : H \leq G \mbox{ and } S \subseteq H \} Then we define \langle S \rangle := \bigcap_{H \in X} H which makes sense as it is a subgroup by being an intersection of subgroups and it contains S . You can think of \langle S\rangle  as the smallest subgroup containing S . The set S is called the generating set and it’s elements are so called generators. By defining \langle S\rangle we have implicitly answered our second question.

If we go back to our initial question, what do the subgroups of G look like? We have an answer as we can simply look at the generated subgroup of subsets of G ! If we come across subset S \subset G with the property \langle S \rangle = G in some sense S ‘builds’ G . A special case of this is what we call a cyclic group, which is the simplest case when G is ‘built’ by one element. A group G is cyclic if there exists an element g \in G such that G = \langle \{g\}\rangle , g is called the generator of G (One would normally write \langle \{g\} \rangle = \langle g \rangle ).

If you’re already familiar with what a cyclic group is I can imagine that was a bit dry, I would apologies but I think it’s a nice story of how notions in mathematics are created and defined. Back to the flashy title. If we ask the question of what do cyclic groups look like it leads us to the theorem bellow.

Theorem: Let n \in \mathbb{Z}^+ . Every cyclic group of order n is isomorphic to \mathbb{Z}_n and every infinite cyclic group is isomorphic to \mathbb{Z} .

Proof of Theorem: Let G be our cyclic group with generator a \in G so we have G = \langle a \rangle = \{a^m \in G : m \in \mathbb{Z} \} . We let n = 0 if G is infinite and n = |G| if G is finite. Define the map \varphi : \mathbb{Z} \longrightarrow G by \varphi(m) = a^m for m \in \mathbb{Z} . If this map is an epimorphism by the homomorphism theorem we have G \cong \mathbb{Z}/ \{0\} \cong \mathbb{Z} if G is infinite and G \cong \mathbb{Z}/n\mathbb{Z} = \mathbb{Z}_n if G is finite. We have reduced the proof to just showing \varphi is an epimorphism.

To show \varphi is a homomorphism let n,m \in \mathbb{Z} . We calculate \varphi(m +n) = a^{m + n} = a^ma^n = \varphi(m)\varphi(n) . Indeed we have an homomorphism, to show it’s surjective we let g \in G so there exists an m \in \mathbb{Z} such that g = a^m as we have assumed G to be cyclic. Observing \varphi(m) = g we are done.