A theorem by Cauchy on ‘thinning’ sequences

Just to clarify all the numbers in this post are going to exist in \mathbb{R} . Given a sequence \{a_n\} where n \in \mathbb{N} we have a collection of partial sums \{s_n\} indexed by \mathbb{N} and defined by s_n = a_1 + a_2 + ... + a_n . If the sequence \{s_n\} converges to s (In the usual \epsilon \delta way) we say the series converges and write \sum_{n = 1}^{\infty} a_n =s. For completness if the sequence \{s_n\} diverges, the series is said to diverge.

If you know sequences really well, you know series really well as every theorem about sequences can be stated in terms of series (putting a_1 = s_1 and a_n = s_n - s_{n-1} for n > 1 ). In particular the monotone convergence theorem has an instantaneous counterpart for series.

Theorem: A series of non-negative real numbered terms converges if and only if the partial sums form a bounded sequence.

I’m going to omit the proof here but it is a quick application of the monotone convergence theorem to the partial sums. So why bring this up? Well if we impose that the terms in our series are decreasing monotically (which can appear in applications) we can apply the following theorem of Cauchy. What is interesting about this theorem is that a ‘thin’ subsequence of \{a_n\} determines the convergence or divergence of the series.

Theorem: Suppose a_1 \geq a_2 \geq ... \geq 0 are real numbers. Then the series \sum_{n=1}^{\infty} a_n converges if and only if the series \sum_{k=0}2^k a_{2^k} converges.

Proof: By the previous theorem it suffices to consider only the boundedness of the partial sums. Let us write s_n = a_1 + ... + a_n and t_k = 2a_2 + ... + 2^ka_{2^k} .  We will look at two cases, when n < 2^k and when n > 2^k .

For n < 2^k we have s_n \leq a_1 + (a_2 + a+3) + ... + (a_{2^k} + ... +a_{2^{k+1} - 1}) \leq a_1 + 2a_2 + ... + 2^ka_{2^k} = t_k where the first inequality followed from n < 2^k and the second inequality from the hypothesis.

When n > 2^k we have s_n \geq a_1 + a_2 + (a_3 + a_4) + ... +(a_{2^{k-1} +1} + ... a_{2^k}) \geq \frac{1}{2}a_1 + a_2 +2a_4 + ... + 2^{k-1}a_{2^k} = \frac{1}{2}t_k where the first inequality follows from n > 2^k and the second (you guessed it) follows from our hypothesis.

Bringing these together we conclude that the sequence \{s_n\} and \{t_k\} are either BOTH bounded or BOTH unbounded which completes the proof.

When I came across this I thought it was pretty astounding (hence why it has made it onto the blog) so lets see it it action. We will use it to deduce for p \in \mathbb{Z} that \sum_{n=2}^{\infty} \frac{1}{n(log \:n)^p} converges if p >1 and diverges if p \leq 1 .

The monotonicity of the logarithmic function implies \{\mbox{log } n \} increases which puts us in good position to apply our theorem. This leads us to the following which is enough as a proof.

\sum_{k=1}^{\infty}2^k\frac{1}{2^k(log \:2^k)^p} = \sum_{k=1}^{\infty} \frac{1}{k(log \:2)^p} =\frac{1}{(log \; 2)^p} \sum_{k=1}^{\infty} \frac{1}{k^p} .


p-Sylow subgroups and why they exist

Let G be a finite group and p a prime such that the order of G is |G| = p^nm where p and m are coprime. Cauchy’s Theorem for finite groups tells us that there exists x \in G whose order is p. We also know by Langrage’s Theorem for finite groups that given a subgroup H \leq G we have the order of H dividing the order of G.

Putting one and two together we can say that the ‘most simple’ subgroup of our finite group G is the subgroup \langle x \rangle or a cyclic group \mathbb{Z}_p. As our goal (sorry I didn’t tell you this earlier) is to look at the subgroups of G it seems clear that the next ‘most simple’ subgroups will have order some power of our prime, i.e p^k where 1 \leq k \leq n.

By ‘simple’ I mean in some sense we are looking for subgroups of G whose existence and size is only determined by the knowledge of knowing our prime in question p divides G.

Enough philosophy, let us define some notions. A p-group is a finite group whose order is a power of a prime p. H is a p-subgroup of a group G if it is a subgroup and a p-group (p necessarily has to divide the order of G). What is the largest p-subgroup of G? Well, Langrange gives us some constraints and motivates the definition of a p-Sylow subgroup. H is a p-Sylow subgroup if it is a subgroup of G of order p^n and p^n is the highest power of p dividing the order of G.

So I hope I’ve described some motives for looking at/for p-Sylow subgroups but can we even guarantee the existence of such a subgroup of our finite group G? Also, would it count as a surprise if I said the answer was yes?

To prove such a result we will use induction on the order of G. If the order of G is prime the result follows from Cauchy’s Theorem. Now suppose we can always find a p-Sylow subgroup in every finite group whose order is divisible by p and is strictly less than that of G.

Langrage tells us for a subgroup H \leq G we have |G| = |H|[G:H] so if H is proper and p doesnt divide [G:H] we can apply the inductive hypothesis to H and the p-Sylow subgroup we find for H will be one we are looking for G.

We rule this case out and suppose that for all proper subgroups H of G the index [G:H] is divisible by p . Let G act on G by conjugation, the action for this is g * h := g^{-1}hg where g, h \in G. Applying the orbit stabiliser theorem tells us for each orbit  G * x   we have |G * x| = [G:G_x] where x \in G and G_x is its stabiliser. Orbits partition G and unwareling the meaning of orbits in this action gives us |G| = |Z(G)| + \sum [G:G_x] where Z(G) is the abelian subgroup Z(G) = \{g \in G : hgh^{-1} =g \mbox{ for all } h \in G \} of G which is often called the center of G. I hope by now when reading these posts you have pen and paper as this is a moment when you should verify what I am claiming. By the case we ruled out we know that p divides |Z(G)| and applying Cauchy’s Theorem we can find an element x \in Z(G) of order p and as \langle x \rangle  \trianglelefteq Z(G) \trianglelefteq G we have \langle x \rangle \trianglelefteq G .

This gives us a quotient group G / \langle x \rangle and a quotient map \phi : G \longrightarrow G / \langle x \rangle and applying the inductive hypothesis to G / \langle x \rangle gives us a p-Sylow subgroup of G / \langle x \rangle whose order is p^{n-1} due to the equality |G| = |\langle x \rangle||G/ \langle x \rangle| = p|G/\langle x \rangle| from Lagrange. Looking at K = \phi^{-1}(K') as \langle x \rangle \subset K and \phi maps K  into K' we have K' \cong K/H and applying Lagrange (again) we obtain |K|= |H||K'| = p^n , as desired. We have found our p-Sylow subgroup K of G and this finishes the proof.

So we have existence but what more can we ask for and look for? How many such subgroups can we find? Also, if we have a standard p-subgroup is it always contained in a p -Sylow subgroup? How hard is it to find such subgroups?

If you don’t know the answer to these it is instructive to try to have a think about how you would go about answering these and try finding some p-Sylow subgroups for your favourite groups. As a hint conjugation is very important.

Unusual metrics on the rationals

Suppose one day you get bored of the triangle inequality. You have done so much real analysis that you cannot see anything new and interesting in |a-b| \le |a-z| + |z+b| anymore. But you want to keep doing analysis.

What can we do about this? One solution may be making this inequality more strict, and see where it takes us. For example, you may consider the following: |a-b| \le \max{\{|a-z|,|z-b|\}}. Note that this inequality does not hold under our usual absolute value, for example: 5 = |8 - 3| \not \le \max{ \{ |8-4|, |4-3| \} } = \max{ \{4,1\} } = 4. But does this tweaking of the triangle inequality add anything new? It may not look like it at first, but it is a dream come true:  a sequence is Cauchy if and only if distance of consecutive terms goes to zero! More mathematically: the sequence (a_n)_{n \in \mathbb{N}} is Cauchy \iff |a_n-a_{n+1}| \to 0 as n \to \infty

This is not true in the case of our usual absolute value. For example, the infamous sequence H_n = \sum_{k = 1}^{n} \frac{1}{k} is not Cauchy in the reals \mathbb{R}, for if it were Cauchy, it would converge since the reals numbers are complete. However, the difference between consecutive terms is |1/n| which clearly tends to zero.

Before we dive into the proof of the previous claim, we need to note something: we are talking about convergence in \mathbb{Q} however this would work in any metric space (X,d) where the triangle inequality satisfies d(a,b) \le  \max{\{d(a,z), d(z,b) \}} for any a,b,z \in X. This is because an absolute value on \mathbb{Q} induces a metric by setting d(a,b) = |a-b|.

Theorem: Let | \cdot | be an absolute value on \mathbb{Q} satisfying |a+b| \le \max{ \{ |a|,|b|\} }. Then any sequence (a_n)_{n \in \mathbb{N}} of rational numbers is Cauchy if and only if |a_n-a_{n+1}| \to 0 as n \to \infty.

Proof: ( \implies) Let \epsilon > 0. Since (a_n)_{n \in \mathbb{N}} is Cauchy there exists N \in \mathbb{N} such that for all n,m > N we have |a_n - a_m| < \epsilon. In particular, whenever m = n+1, we obtain  |a_n - a_{n+1}| < \epsilon, which means that |a_n - a_{n+1}| \to 0 as n \to \infty.

( \impliedby ) Let \epsilon > 0. There exists N \in \mathbb{N} such that |a_n - a_{n+1}| < \epsilon for any n > N.
Then, for all m \ge n > N, applying our strict triangle inequality:

\begin{aligned} |a_n - a_m| &= |a_n - a_{n+1} + a_{n+1} - \dots + a_{m-1} -a_{m}| \\  &\le \max{\{|a_n - a_{n+1}|, \dots,  |a_{m-1} -a_{m}| \} } < \epsilon \end{aligned} .

Therefore the sequence is Cauchy.   \square

This is all well and good, however, it makes us wonder whether there is any absolute value on \mathbb{Q} with this strict triangle inequality property. The answer is yes, in fact, there are infinitely many of them, but this is the topic for another post. Until then, to satisfy your curiosity I will write down one such absolute value and I will leave it up to you to check that it is indeed an absolute value:

|x|_3 = \frac{1}{3^{v(x)}}
where v(x) = n is the exponent of 3 whenever we write the rational number as x = 3^n \frac{p}{q}, where p,q are coprime to 3.

Can you suggest a different absolute value?

When is the zero polynomial the only zero function?

Like all mathematics, the first thing to do is set the landsape in which you wish to work in. Let K be a field and let K[x_1,...,x_n] be the commutative polynomial ring with n variables (if you are familiar with this ring I would skip the next paragraph)

What do the elements of  K[x_1,...,x_n] look like? Taking n =3 it is not hard to come up with elements, for example x_1x_3 +{x_2}^4 + {x_2}^3{x_3}^2 . One can quite easily start conjuring up polynomials for a given n but a more interesting approach to answer the question, can we find a ‘generating’ set for our polynomials? This brings one to the notion of a moninomial which is simply a product of the form x_1^{\alpha_1} ... x_n^{\alpha_n} . We can simplify the notation by starting with an n-tuple \alpha = (\alpha_1,..., \alpha_n) of non-negative integers and letting X^\alpha = x_1^{\alpha_1} ... x_n^{\alpha_n} . Clearly a poynomial f \in K[x_1,...,x_n] can be written as a finite linear combination of moninomials with the general form f = \sum_{\alpha \in A} C_\alpha X^\alpha where C_\alpha \in K and A \subset {\mathbb{N}_{0}}^n with A also being finite. Without knowing what a polynomial is aprior this is the method used to define what a polynomial is with n variables.

K[x_1,...,x_n] is a ring with addition and multiplicative defined how you would expect and as K is a field the ring is commutative. Given f \in K[x_1,...,x_n] we can think of it as an algebraic element of our ring or a function f : K^n \longrightarrow K by swapping x_i with some a_i \in K for all i . This gives us two notions of zero; the zero element of K[x_1,...,x_n] and a zero function from K^n \longrightarrow K .

To emphasise the difference let us take K = \mathbb{F}_2 , n =1 and look at the polynomials f = 0 and g = x(x-1) in K[x]. As algebraic elements clearly f is the zero element and g isn’t but as functions g(0) = 0 and g(1) = 0 so we can conclude both f and g are zero functions. To wrap up we have two distinct elements that are both zero functions K^n \longrightarrow K.

This invokes a big question and one in which we will answer: When is the zero polynomial the only zero function? (I didn’t lead you with false pretenses by the title). The answer depends on the cardinality of K .

If K is finite of size q by Lagrange’s Theorem and looking at the multiplicative group K\backslash \{0\} one can show x^q = x for all x \in K so the polynomial f = x^q- x is a zero function but is distinct from the zero element in K[x_1,...,x_n] . If n > 1 we can define f = x_1^q - x_1 . This tells us that when K is finite we can always come up with elements in K[x_1,...,x_n] that are non-zero and are a zero function K^n \longrightarrow K so there is no unique zero function.

What about the case when K is infinite? Well…

Theorem: Let K be an infinite field, let f \in K[x_1,...,x_n] . Then, f = 0 if and only if f : K^n \longrightarrow K is the zero function.

Proof of Theorem: Let us begin with the ‘easy’ direction, suppose f = 0 \in K[x_1,...,x_n] trivially we must have f(\alpha) = 0 for all \alpha \in K^n . The other direction requires us to show that if for some f \in K[x_1,...,x_n] such that f(\alpha) = 0 for all \alpha \in K^n then f is the zero element in K[x_1,...,x_n] .

We will use induction on the number of variabels n . When n = 1 a non-zero polynomial in K[x] of degree m has at most m roots. Suppose f \in K[x] such that f(\alpha) = 0 for all \alpha \in K and since K is infinite f must have infintely many roots and as the degree of f must be finite we must conclude that f = 0 .

Now assuming the inductive hypothesis for n - 1 let f \in K[x_1,...,x_n] be a polynomial such that f(\alpha) = 0 for all \alpha \in K^n . By collect the various powers of x_n we can write f in the form of f = \sum_{i}^{N} g_i(x_1,...,x_{n-1})x_n^i where N is the highest power of x_n that appears in f and g_i \in K[x_1,...,x_{n-1} ] . If we show that g_i is the zero polynomial in n - 1 variables this will force f = 0 in K[x_1,...,x_n] .

If we fix (\alpha_1,...,\alpha_{n-1}) \in K^{n-1} we get a polynomial f(\alpha_1,...,\alpha_{n-1}, x_n) \in K[x_n] and by our hypothesis this polynomial vanishes for all \alpha_n \in K . Using our above representation of f we see the coefficients of f(\alpha_1,...,\alpha_{n-1}, x_n) are g_i ( x_1,...,x_{n-1} ) and hence g_i(\alpha_1,...,\alpha_{n-1}) = 0 for all i . As (\alpha_1,...,\alpha_{n-1}) \in K^{n-1} is chosen arbitraty g_i is the zero function K^{n-1} \longrightarrow K and so by the inductive hypothesis g_i is the zero polynomial in K[x_1,..., x_n] .

Bringing in both of these results lets us conclude that if a polynomial zero function K^n \longrightarrow K is the zero element of K[x_1,...,x_n] then we must have that the field K is infinite. I find this quite a remarkable result in the fact that knowledge about a zero function gives you an indication on the cardinality of K .

The theorem we have just proved also lets us conclude that when K is infinite polynomials are unique functions or more precisely for f ,g \in K[x_1,...,x_n] f = g in K[x_1,...,x_n] if and only if f: K^n \longrightarrow K and g : K^n \longrightarrow K are the same functions. I leave the proof of this for the reader and as a hint consider the polynomial f - g.


An upper bound for roots of polynomials

A natural question to ask when given a polynomial is: how do the roots and the coefficients interplay with one another? In fact a huge portion of polynomial theory and solving algebraic equations is finding out the details of this relationship. The quadratic formula is a prime example – It gives a way of finding roots by only using the values of the coefficients. An explicit relationship, an algebraic formula, is only possible for polynomials of degree less than 5 which is a famous result pioneered by the work of Galois.

The interplay we are going to determine for polynomials of any degree is that you only need to look for roots in a neighborhood around 0 with the neighborhood’s size depending only on the size of the polynomial’s coefficients.

Let p \in \mathbb{R}[x] be a polynomial over \mathbb{R} of degree n with the representation p(x) = a_nx^n +... a_1x + a_0 where a_n \neq 0 and that |a_i| \leq A for all i = 1,..., a and some A \in \mathbb{R} . We will be considering our polynomial as a function p : \mathbb{C} \longrightarrow \mathbb{C} and be setting M \geq 0 and R = 1 + \frac{A + M}{|a_n|} where our choice of R will be more apparent later.

What we are going to show is that for x of a large enough magnitude our polynomial will have a magnitude as large as we like. We will also establish a simple bound on the magnitude of the roots of our polynomial p .

Suppose |x| \geq R , we have |a_n x^n| = \frac{A + M}{R- 1}|x|^n by the definition of our constants and estimating |a_{n-1}x^{n-1} + .... + a_0|  gives | a_{n-1}x^{n-1} + .... + a_0 | \leq A (|x^{n-1}| + .... + |x| + 1 ) = A\frac{|x|^n - 1}{|x| - 1 } < A \frac{|x|^n}{R-1} by using the sum of a geometric series and the triangle inequality.

Using the triangle inequality allows us to conclude |a_nx^n| - |a_{n-1}x^{n-1} + ... + a_0 | \leq |p(x)| which gives us by using the previous estimation |p(x)| \geq \frac{A+ M}{R-1} |x|^n - \frac{A}{R-1} |x|^n = \frac{M|x|^n}{R-1} \geq \frac{MR^n}{R-1} > M as both |x| \leq R and R^n > R > R - 1 as R > 1.

Bringing this together we have shown for all M \geq 0 if we look at |x| \geq R where R = 1 + \frac{A + M}{|a_n|} we have |p(x)| > M .

Looking at the contrapositive of this statement lets us conclude if |p(x)| \leq M for some x \in \mathbb{C} and M \geq 0 we must have |x| < R or equivalently x living in the ball B(0;R). If we take x to be a root of p then by setting M = 0 we have that |x| < R = 1 +\frac{A}{|a_n|} giving us a bound on the size of our root in terms of the polynomials coefficients. This means all roots of p will be found in the ball B(0; a + \frac{A}{|a_n|} ).


Existence of a real root for polynomials of odd degree

The fundamental theorem of algebra famously states that a polynomial of degree n with real coefficients, say p \in \mathbb{R}[x] , has n complex roots when counting multiplicities. This is equivalent to saying p(x) = \prod_{k=1}^{k=m} (x - \alpha_k)^{h_k} where h_1 + ... + h_m = n and \alpha_i \neq \alpha_j \in \mathbb{C} for all i,j = 1,...,m such that i \neq j . This is due to the ‘equivalence’ of roots and linear factors.

We will probably get round to proving this at some point so if you want to find out how don’t forget to follow us! Or just google how to prove the fundamental theorem of algebra – It is really up to you.

What is rather nice is that by throwing in the condition that n is odd we can guarantee the existence of a real root. The idea is nicely summed up in the Numberphile video:

Presumably if you watched the video you probably have a very good idea of how to prove the result but let us write it out in mathematics.

We have p \in \mathbb{R}[x] with degree n being odd and suppose p(x) = a_0 +a_1x + ... a_nx^n where a_i \in \mathbb{R} and a_n \neq 0 . Polynomials are continuous so we must have p being continuous. Looking at the behaviour of \frac{p(x)}{x^n} = \frac{a_o + ... + a_n x^n}{x^n} = \frac{a_n}{x^n} + ... + a_n as |x| \longrightarrow \infty we have \frac{p(x)}{x^n} \longrightarrow a_n . If we change x to -x , due to n being odd, x^n changes sign. The sign of \frac{p(x)}{x^n} for sufficiently large x always has the same sign as a_n so putting the two facts together means p(x) and p(-x) differ in sign – This is the ‘crossing the x-axis’. We know p is continuous so it is nice and in particular we can use the Intermediate Value Theorem. As 0 is in the image of p when we take |x| sufficiently large there must be an \alpha \in \mathbb{R} such that p(\alpha) = 0 by the intermediate value theorem.


Solving Linear Congruences

Imagine a scenario where you’re lost, lost in a big city full of number theorists, and you wish to ask someone for directions to get home. You approach someone and to your surprise they want you to work for the directions! In particular they want you to produce all the solutions of the linear congruence 4x \equiv 36 \mbox{ (mod } 6 ).

How do you go about getting the directions you so desperately need? Well, to solve a general linear congruence of the form ax \equiv b \mbox{ (mod } n ) it is important to remember the following theorem.

Theorem: ax \equiv b \mbox{ (mod } n ) has a solution iff \gcd(a,n) | b .

Let us write d = \gcd(a,n) and formulate an algorithm to solve our congruence ax \equiv b \mbox{ (mod } n ) – If you ever visit a city of number theorists I strongly recommend memorising this.

(1) – To even wish of having a solution we have to check that d | b .

(2) – Supposing d | b , use the Euclidean Algorithm to find s,t \in \mathbb{Z} such that sa + tn = d.

(3) – By using d | b there exists u \in \mathbb{Z} such that b = du so we have sau + tnu = du = b and rearranging gives sau - b = tnu so asu \equiv b \mbox{ (mod } n ) meaning x_0 = su is a solution to our congruence.

(4) – To get the other solutions we translate x_0 by \frac{nv}{d} where v \in \mathbb{Z} . This will exhaust all possible solutions mod n which are x_0, x_0 + \frac{n}{d},..., x_0 + \frac{(d-1)n}{d} .

You are now in a position to get home if you ever find yourself in such a scenario.