Unusual metrics on the rationals

Suppose one day you get bored of the triangle inequality. You have done so much real analysis that you cannot see anything new and interesting in |a-b| \le |a-z| + |z+b| anymore. But you want to keep doing analysis.

What can we do about this? One solution may be making this inequality more strict, and see where it takes us. For example, you may consider the following: |a-b| \le \max{\{|a-z|,|z-b|\}}. Note that this inequality does not hold under our usual absolute value, for example: 5 = |8 - 3| \not \le \max{ \{ |8-4|, |4-3| \} } = \max{ \{4,1\} } = 4. But does this tweaking of the triangle inequality add anything new? It may not look like it at first, but it is a dream come true:  a sequence is Cauchy if and only if distance of consecutive terms goes to zero! More mathematically: the sequence (a_n)_{n \in \mathbb{N}} is Cauchy \iff |a_n-a_{n+1}| \to 0 as n \to \infty

This is not true in the case of our usual absolute value. For example, the infamous sequence H_n = \sum_{k = 1}^{n} \frac{1}{k} is not Cauchy in the reals \mathbb{R}, for if it were Cauchy, it would converge since the reals numbers are complete. However, the difference between consecutive terms is |1/n| which clearly tends to zero.

Before we dive into the proof of the previous claim, we need to note something: we are talking about convergence in \mathbb{Q} however this would work in any metric space (X,d) where the triangle inequality satisfies d(a,b) \le  \max{\{d(a,z), d(z,b) \}} for any a,b,z \in X. This is because an absolute value on \mathbb{Q} induces a metric by setting d(a,b) = |a-b|.

Theorem: Let | \cdot | be an absolute value on \mathbb{Q} satisfying |a+b| \le \max{ \{ |a|,|b|\} }. Then any sequence (a_n)_{n \in \mathbb{N}} of rational numbers is Cauchy if and only if |a_n-a_{n+1}| \to 0 as n \to \infty.

Proof: ( \implies) Let \epsilon > 0. Since (a_n)_{n \in \mathbb{N}} is Cauchy there exists N \in \mathbb{N} such that for all n,m > N we have |a_n - a_m| < \epsilon. In particular, whenever m = n+1, we obtain  |a_n - a_{n+1}| < \epsilon, which means that |a_n - a_{n+1}| \to 0 as n \to \infty.

( \impliedby ) Let \epsilon > 0. There exists N \in \mathbb{N} such that |a_n - a_{n+1}| < \epsilon for any n > N.
Then, for all m \ge n > N, applying our strict triangle inequality:

\begin{aligned} |a_n - a_m| &= |a_n - a_{n+1} + a_{n+1} - \dots + a_{m-1} -a_{m}| \\  &\le \max{\{|a_n - a_{n+1}|, \dots,  |a_{m-1} -a_{m}| \} } < \epsilon \end{aligned} .

Therefore the sequence is Cauchy.   \square

This is all well and good, however, it makes us wonder whether there is any absolute value on \mathbb{Q} with this strict triangle inequality property. The answer is yes, in fact, there are infinitely many of them, but this is the topic for another post. Until then, to satisfy your curiosity I will write down one such absolute value and I will leave it up to you to check that it is indeed an absolute value:

|x|_3 = \frac{1}{3^{v(x)}}
where v(x) = n is the exponent of 3 whenever we write the rational number as x = 3^n \frac{p}{q}, where p,q are coprime to 3.

Can you suggest a different absolute value?

When is the zero polynomial the only zero function?

Like all mathematics, the first thing to do is set the landsape in which you wish to work in. Let K be a field and let K[x_1,...,x_n] be the commutative polynomial ring with n variables (if you are familiar with this ring I would skip the next paragraph)

What do the elements of  K[x_1,...,x_n] look like? Taking n =3 it is not hard to come up with elements, for example x_1x_3 +{x_2}^4 + {x_2}^3{x_3}^2 . One can quite easily start conjuring up polynomials for a given n but a more interesting approach to answer the question, can we find a ‘generating’ set for our polynomials? This brings one to the notion of a moninomial which is simply a product of the form x_1^{\alpha_1} ... x_n^{\alpha_n} . We can simplify the notation by starting with an n-tuple \alpha = (\alpha_1,..., \alpha_n) of non-negative integers and letting X^\alpha = x_1^{\alpha_1} ... x_n^{\alpha_n} . Clearly a poynomial f \in K[x_1,...,x_n] can be written as a finite linear combination of moninomials with the general form f = \sum_{\alpha \in A} C_\alpha X^\alpha where C_\alpha \in K and A \subset {\mathbb{N}_{0}}^n with A also being finite. Without knowing what a polynomial is aprior this is the method used to define what a polynomial is with n variables.

K[x_1,...,x_n] is a ring with addition and multiplicative defined how you would expect and as K is a field the ring is commutative. Given f \in K[x_1,...,x_n] we can think of it as an algebraic element of our ring or a function f : K^n \longrightarrow K by swapping x_i with some a_i \in K for all i . This gives us two notions of zero; the zero element of K[x_1,...,x_n] and a zero function from K^n \longrightarrow K .

To emphasise the difference let us take K = \mathbb{F}_2 , n =1 and look at the polynomials f = 0 and g = x(x-1) in K[x]. As algebraic elements clearly f is the zero element and g isn’t but as functions g(0) = 0 and g(1) = 0 so we can conclude both f and g are zero functions. To wrap up we have two distinct elements that are both zero functions K^n \longrightarrow K.

This invokes a big question and one in which we will answer: When is the zero polynomial the only zero function? (I didn’t lead you with false pretenses by the title). The answer depends on the cardinality of K .

If K is finite of size q by Lagrange’s Theorem and looking at the multiplicative group K\backslash \{0\} one can show x^q = x for all x \in K so the polynomial f = x^q- x is a zero function but is distinct from the zero element in K[x_1,...,x_n] . If n > 1 we can define f = x_1^q - x_1 . This tells us that when K is finite we can always come up with elements in K[x_1,...,x_n] that are non-zero and are a zero function K^n \longrightarrow K so there is no unique zero function.

What about the case when K is infinite? Well…

Theorem: Let K be an infinite field, let f \in K[x_1,...,x_n] . Then, f = 0 if and only if f : K^n \longrightarrow K is the zero function.

Proof of Theorem: Let us begin with the ‘easy’ direction, suppose f = 0 \in K[x_1,...,x_n] trivially we must have f(\alpha) = 0 for all \alpha \in K^n . The other direction requires us to show that if for some f \in K[x_1,...,x_n] such that f(\alpha) = 0 for all \alpha \in K^n then f is the zero element in K[x_1,...,x_n] .

We will use induction on the number of variabels n . When n = 1 a non-zero polynomial in K[x] of degree m has at most m roots. Suppose f \in K[x] such that f(\alpha) = 0 for all \alpha \in K and since K is infinite f must have infintely many roots and as the degree of f must be finite we must conclude that f = 0 .

Now assuming the inductive hypothesis for n - 1 let f \in K[x_1,...,x_n] be a polynomial such that f(\alpha) = 0 for all \alpha \in K^n . By collect the various powers of x_n we can write f in the form of f = \sum_{i}^{N} g_i(x_1,...,x_{n-1})x_n^i where N is the highest power of x_n that appears in f and g_i \in K[x_1,...,x_{n-1} ] . If we show that g_i is the zero polynomial in n - 1 variables this will force f = 0 in K[x_1,...,x_n] .

If we fix (\alpha_1,...,\alpha_{n-1}) \in K^{n-1} we get a polynomial f(\alpha_1,...,\alpha_{n-1}, x_n) \in K[x_n] and by our hypothesis this polynomial vanishes for all \alpha_n \in K . Using our above representation of f we see the coefficients of f(\alpha_1,...,\alpha_{n-1}, x_n) are g_i ( x_1,...,x_{n-1} ) and hence g_i(\alpha_1,...,\alpha_{n-1}) = 0 for all i . As (\alpha_1,...,\alpha_{n-1}) \in K^{n-1} is chosen arbitraty g_i is the zero function K^{n-1} \longrightarrow K and so by the inductive hypothesis g_i is the zero polynomial in K[x_1,..., x_n] .

Bringing in both of these results lets us conclude that if a polynomial zero function K^n \longrightarrow K is the zero element of K[x_1,...,x_n] then we must have that the field K is infinite. I find this quite a remarkable result in the fact that knowledge about a zero function gives you an indication on the cardinality of K .

The theorem we have just proved also lets us conclude that when K is infinite polynomials are unique functions or more precisely for f ,g \in K[x_1,...,x_n] f = g in K[x_1,...,x_n] if and only if f: K^n \longrightarrow K and g : K^n \longrightarrow K are the same functions. I leave the proof of this for the reader and as a hint consider the polynomial f - g.

 

An upper bound for roots of polynomials

A natural question to ask when given a polynomial is: how do the roots and the coefficients interplay with one another? In fact a huge portion of polynomial theory and solving algebraic equations is finding out the details of this relationship. The quadratic formula is a prime example – It gives a way of finding roots by only using the values of the coefficients. An explicit relationship, an algebraic formula, is only possible for polynomials of degree less than 5 which is a famous result pioneered by the work of Galois.

The interplay we are going to determine for polynomials of any degree is that you only need to look for roots in a neighborhood around 0 with the neighborhood’s size depending only on the size of the polynomial’s coefficients.

Let p \in \mathbb{R}[x] be a polynomial over \mathbb{R} of degree n with the representation p(x) = a_nx^n +... a_1x + a_0 where a_n \neq 0 and that |a_i| \leq A for all i = 1,..., a and some A \in \mathbb{R} . We will be considering our polynomial as a function p : \mathbb{C} \longrightarrow \mathbb{C} and be setting M \geq 0 and R = 1 + \frac{A + M}{|a_n|} where our choice of R will be more apparent later.

What we are going to show is that for x of a large enough magnitude our polynomial will have a magnitude as large as we like. We will also establish a simple bound on the magnitude of the roots of our polynomial p .

Suppose |x| \geq R , we have |a_n x^n| = \frac{A + M}{R- 1}|x|^n by the definition of our constants and estimating |a_{n-1}x^{n-1} + .... + a_0|  gives | a_{n-1}x^{n-1} + .... + a_0 | \leq A (|x^{n-1}| + .... + |x| + 1 ) = A\frac{|x|^n - 1}{|x| - 1 } < A \frac{|x|^n}{R-1} by using the sum of a geometric series and the triangle inequality.

Using the triangle inequality allows us to conclude |a_nx^n| - |a_{n-1}x^{n-1} + ... + a_0 | \leq |p(x)| which gives us by using the previous estimation |p(x)| \geq \frac{A+ M}{R-1} |x|^n - \frac{A}{R-1} |x|^n = \frac{M|x|^n}{R-1} \geq \frac{MR^n}{R-1} > M as both |x| \leq R and R^n > R > R - 1 as R > 1.

Bringing this together we have shown for all M \geq 0 if we look at |x| \geq R where R = 1 + \frac{A + M}{|a_n|} we have |p(x)| > M .

Looking at the contrapositive of this statement lets us conclude if |p(x)| \leq M for some x \in \mathbb{C} and M \geq 0 we must have |x| < R or equivalently x living in the ball B(0;R). If we take x to be a root of p then by setting M = 0 we have that |x| < R = 1 +\frac{A}{|a_n|} giving us a bound on the size of our root in terms of the polynomials coefficients. This means all roots of p will be found in the ball B(0; a + \frac{A}{|a_n|} ).

 

Existence of a real root for polynomials of odd degree

The fundamental theorem of algebra famously states that a polynomial of degree n with real coefficients, say p \in \mathbb{R}[x] , has n complex roots when counting multiplicities. This is equivalent to saying p(x) = \prod_{k=1}^{k=m} (x - \alpha_k)^{h_k} where h_1 + ... + h_m = n and \alpha_i \neq \alpha_j \in \mathbb{C} for all i,j = 1,...,m such that i \neq j . This is due to the ‘equivalence’ of roots and linear factors.

We will probably get round to proving this at some point so if you want to find out how don’t forget to follow us! Or just google how to prove the fundamental theorem of algebra – It is really up to you.

What is rather nice is that by throwing in the condition that n is odd we can guarantee the existence of a real root. The idea is nicely summed up in the Numberphile video:

Presumably if you watched the video you probably have a very good idea of how to prove the result but let us write it out in mathematics.

We have p \in \mathbb{R}[x] with degree n being odd and suppose p(x) = a_0 +a_1x + ... a_nx^n where a_i \in \mathbb{R} and a_n \neq 0 . Polynomials are continuous so we must have p being continuous. Looking at the behaviour of \frac{p(x)}{x^n} = \frac{a_o + ... + a_n x^n}{x^n} = \frac{a_n}{x^n} + ... + a_n as |x| \longrightarrow \infty we have \frac{p(x)}{x^n} \longrightarrow a_n . If we change x to -x , due to n being odd, x^n changes sign. The sign of \frac{p(x)}{x^n} for sufficiently large x always has the same sign as a_n so putting the two facts together means p(x) and p(-x) differ in sign – This is the ‘crossing the x-axis’. We know p is continuous so it is nice and in particular we can use the Intermediate Value Theorem. As 0 is in the image of p when we take |x| sufficiently large there must be an \alpha \in \mathbb{R} such that p(\alpha) = 0 by the intermediate value theorem.

 

Solving Linear Congruences

Imagine a scenario where you’re lost, lost in a big city full of number theorists, and you wish to ask someone for directions to get home. You approach someone and to your surprise they want you to work for the directions! In particular they want you to produce all the solutions of the linear congruence 4x \equiv 36 \mbox{ (mod } 6 ).

How do you go about getting the directions you so desperately need? Well, to solve a general linear congruence of the form ax \equiv b \mbox{ (mod } n ) it is important to remember the following theorem.

Theorem: ax \equiv b \mbox{ (mod } n ) has a solution iff \gcd(a,n) | b .

Let us write d = \gcd(a,n) and formulate an algorithm to solve our congruence ax \equiv b \mbox{ (mod } n ) – If you ever visit a city of number theorists I strongly recommend memorising this.

(1) – To even wish of having a solution we have to check that d | b .

(2) – Supposing d | b , use the Euclidean Algorithm to find s,t \in \mathbb{Z} such that sa + tn = d.

(3) – By using d | b there exists u \in \mathbb{Z} such that b = du so we have sau + tnu = du = b and rearranging gives sau - b = tnu so asu \equiv b \mbox{ (mod } n ) meaning x_0 = su is a solution to our congruence.

(4) – To get the other solutions we translate x_0 by \frac{nv}{d} where v \in \mathbb{Z} . This will exhaust all possible solutions mod n which are x_0, x_0 + \frac{n}{d},..., x_0 + \frac{(d-1)n}{d} .

You are now in a position to get home if you ever find yourself in such a scenario.

Dirichlet’s approximation

When you are first introduced to the real numbers you are almost forcefully told that they can be approximated by rationals numbers. The voice in your head goes something like ‘we can get as close as we like to an irrational by using rationals’. This is normally expressed in the statement \overline{\mathbb{Q}} = \mathbb{R} which reads \mathbb{Q} is dense in \mathbb{R} where ‘dense’ is a very precise mathematical statement.

It’s quite easy to think we are done here, but what invokes a good approximation? Dirichlet’s approximation is not only good in the sense you can control how close you want your rational approximation to be but it gives a splitting, in terms of behavior, with respect to rational and irrational numbers.

Theorem: For \alpha \in \mathbb{R} \backslash \mathbb{Q} we can find infinitely \frac{p}{q} \in \mathbb{Q} where \gcd(p,q) = 1 such that | \alpha - \frac{p}{q} | < \frac{1}{q^2}. Moreover, if \alpha \in \mathbb{Q} only finitely many approximations \frac{p}{q} exist.

Proof of theorem: We are fist going to introduce some notation. let x \in \mathbb{R} the integral part of x and the fractional part of x will be denoted and defined as [x] := \max \{z \in \mathbb{Z} : z \leq x \} and \{x\} := x - [x] respectively.

Let \alpha be irrational and Q be a positive integer.  The numbers 0, \{\alpha\}, \{2 \alpha \},..., \{Q\alpha \} gives us Q + 1 points between 0 and 1 that are distributed among the intervals [ \frac{j-1}{Q}, \frac{j}{Q} ) for j = 1,..., Q. Applying the pigeon hole principle means there has to be at least one interval containing at least two numbers \{k\alpha\} and \{l\alpha\} such that \{k\alpha\} \geq \{l\alpha\} with 0 \leq k, l \leq Q and k \neq l.

Computing the difference of \{k\alpha\} and \{l\alpha\} gives us \{k\alpha\} - \{l \alpha\} = k\alpha - [k\alpha] - l\alpha + [l\alpha] = \{(k-l)\alpha\} + [(k-l)\alpha] + [l\alpha] - [k\alpha]  which lies in the interval [0, \frac{1}{Q} ) meaning that [(k-l)\alpha] + [l\alpha] - [k\alpha] = 0 as the LHS is an integer. Setting q = k - l and p = [q \alpha] gives the property \{q \alpha\} < \frac{1}{Q} and so computing the difference between \alpha and \frac{p}{q} gives | \alpha - \frac{p}{q} | = \frac{|q\alpha - p |}{q} = \frac{\{q\alpha\}}{q} < \frac{1}{Qq}< \frac{1}{q^2} . The last inequality follows from q < Q .

To see the split in behaviour we first assume \alpha is irrational and that there are finitely many approximations \frac{p_1}{q_1},..., \frac{p_n}{q_n} . Since \alpha is irrational by the fact $latex  \overline{\mathbb{Q}} = \mathbb{R}$ we can fine Q (a positive integer) such that for all j = 1,...,n we have |\alpha - \frac{p_j}{q_j} | > \frac{1}{Q} which contradicts the established facts above.

In the second case when \alpha is a rational number, say \alpha = \frac{a}{b} with a \in \mathbb{Z} and b\in \mathbb{N} if \frac{a}{b} \neq \frac{p}{q} then by direct computation |\alpha - \frac{p}{q} | = \frac{|aq - bp|}{bq} \geq \frac{1}{bq} so we can only have finitely many such approximations as we require q < b .

Fermat’s Last Theorem (the divisible by 4 case)

Fermat’s last Theorem is infamous, it was unsolved for 358 years and when Andrew Wiles proved the result in 1994 the world received it so well he was offered an opportunity to model for gap jeans.

Although Fermat didn’t give his ‘proof’ of the result due to the width of the margin being too small he did give a proof of the case when n = 4 and hence proved the sub-sequential cases when n is divisible by 4. He did this by considering the more general Diophantine equation X^4 + Y^4 = Z^2 and using the chain that a solution of X^{4n} + Y^{4n} = Z^{4n} gives a solution of X^4 + Y^4 = Z^4 which gives a solution of X^4 + Y^4 = Z^2 it is sufficient to show the last equation has no integer solutions. The symmetry of the equation also means it is sufficient to consider positive integer solutions.

Theorem: The equation X^4 + Y^4 = Z^2 has no positive integer solutions.

Proof of Theorem:  We are going to give Fermat’s original proof which invokes the method of infinite descent. The idea of this method in the most imprecise and general way is to assume the existence of a ‘minimal’ something and show there is a more ‘minimal’ something of the same type hence giving a contradiction to conclude no something exists.

To begin, we assume that z is the least positive integer for which the equation X^2 + Y^2 = z^2 has a solution with positive integers x, y . Due to the minimality condition of z we can assume that \gcd(x,y) = 1 . This means at least one of x and y is odd and we shall assume x is odd. One can check a^2 \equiv 0 (mod 4 ) or a^2 \equiv 1 (mod 4 ) for any integer a . Using this knowledge z^2 = x^2 + y^2 \equiv 1 or 2 (mod 4 ) and as a square cannot be congruent to 2 mod 4 we must have z being odd. This implies that y is even.

Our solution x,y and z gives a primitive Pythagorean triple (x^2,y^2,z) so using the parametrization in our previous post we can conclude: x^2 = a^2 - b^2, y^2 = 2ab and z = a^2 + b^2 where a and b are coprime integers of opposite parity. If a is even and b is odd then x^2 \equiv a^2 - b^2 \equiv 0 - 1 = -1 ( mod 4 ) which is impossible. Thus, a is odd and b is even with b = 2c for some integer c. Looking back we observe that (\frac{1}{2}y)^2 = ac where \gcd(a,c) = 1 as a and b are coprime. This means that a and c are perfect squares so we can find coprime positive integers u and v such that a = u^2 and b = v^2 . Due to a being odd we must have u being odd.

Using the fact x^2 + b^2 = a^2 we can conclude that (2v^2)^2 + x^2 = (u^2)^2 where the integers 2v^2,x and u^2 have no common factor pairwise. This means (2v^2,x,u^2) is a primitive Pythagorean triple so using the parametrization of such triples we have two coprime positive integers A and B such that 2v^2 = 2AB and u^2 = A^2 + B^2 .

From 2v^2 = 2AB we have v^2 = AB and as \gcd(A,B) = 1 we must have some positive coprime integers X and Y where A = X^2 and B = Y^2 . Looking at the equation u^2 = A^2 + B^2 we must have u^2 = X^4 + Y^4.

Following the method of infinite decent we have constructed a new solution (X,Y,u) from our minimal solution (x,y,z). However, u \leq u^2 \leq a \leq a^2 < a^2 + b^2 = z so we have in fact constructed a ‘more’ minimal solution which impossible by our assumption on z. This means that our hypothesis that a minimal solution exists or any solution existing is false so we conclude X^4 + Y^4 = Z^2 has no positive integer solutions.