Idempotent elements in Rings

For a commutative ring R an idempotent is an element e \in R such that e^2 = e . The collection Re is an ideal of R and in fact, moreover, is a ring in its own right with identity e (why?). A good example of an idempotent to have in mind is the element (1,0) \in R \times R' and in some sense, these are the only idempotents. R' is another commutative ring.

For idempotent e \in R we have e' =1 -e \in R also being an idempotent and ee' = 0 . These should remind you of elements e = (1,0), e' = (0,1) \in R\times R'. We call a pairing \{e, e'\} such that e, e' \in R and e' = 1 - e and ee' = 0 complementary idempotents.

Complementary idempotents give a formulation of an inner product. What do I mean by this? If \{e,e'\} is a pair of complementary idempotents in R, then defining R' = Re, R'' = Re (remember these are rings in their own right) we have that \phi \colon R \to R' \times R'' where r \mapsto (re,re') being a ring isomorphism. This is an instructive check to make and I leave it to the reader.

What do idempotents of some rings look like? 1 is always idempotent, making the pair (1,0) complementary idempotents. In all fields, the only non-zero idempotent element is the multiplicative unit 1 .

What about rings \mathbb{Z}/n\mathbb{Z} ? if n = p^k for some prime number p , supposing that a + p^k\mathbb{Z} is an idempotent, we have p^k | a(1-a) and so if k = s+ t such that p^s | a and p^t |1-a if both s, t \geq 1 then p| a + (1-a)n = 1 which is a contradiction, so p^k | a or p^k | 1-a . The only idempotents \mathbb{Z}/p^k\mathbb{Z} are 0 and 1. The only idempotent pair is \{1,0\}.

We can now look at the general case, let n = {p_1}^{r_1} \ldots {p_m}^{r_m} be n‘s prime factorisation, the chinese remainder theorem tells us \mathbb{Z}/n\mathbb{Z} \cong \mathbb{Z}/{p_1}^{r_1}\mathbb{Z} \times \cdots \times \mathbb{Z}/{p_m}^{r_m}\mathbb{Z} so there are 2^n idempotents, and 2^{n-1} pairs of complementary idempotents.

Lagrange Interpolation, a unique polynomial

Let z_1, \ldots, z_k be k distinct complex numbers and let w_1, \ldots, w_k be k complex numbers which need not be distinct. A natural question to ask is, can we find a polynomial P(z) such that P(z_m) = w_m for m = 1,\ldots, k ?

We can and in fact we can say more, there is a unique polynomial of degree \leq k-1 satisfying the above! This polynomial is called the Lagrange interpolation polynomial and is constructed explicitly.

We start by setting A(z) = (z - z_1)\cdots(z-z_k) and let A_m(z) = \dfrac{A(z)}{z-z_m} .

What have we just constructed? A_m(z) is a polynomial of degree k-1 with A_m(z_m) \neq 0 and A_m(z_i) = 0 if i \neq m.

So \dfrac{A_m(z)}{A_m(z_m)} is another polnomial of degree k-1 which vanishes at z_j for j \neq k , and takes 1 at z_m. This polynomial picks out an idividual point and ignores all the rest, this is precicely what we need.

Let us define P(z) = \sum_{m=1}^{k} w_m \dfrac{A_m(z)}{A_m(z_m)} which is a polynomial of degree \leq k-1 with the properties we want.

What about uniqueness? Suppose we have another such polynomial Q(z), the difference P(z) - Q(z) vanishes at k distinct points, with degree \leq k-1 so it must be 0 and hence P(z) = Q(z).

 

Algebras and why modules are a thing

A k-algebra A where k is a field, is a ring with the added structure of being a vector space over k (with respect to the same addition) where scalar multiplication behaves nicely with the ring multiplication. \mu(ab) = (\mu a)b = a(\mu b) for \mu \in k and a, b \in A. The atypical example is a matrix algebra of dimension n with matrix elements in k denoted M_{n}(k). There is a notion of algebra morphism which is not too hard to figure out with the above or a quick google, so we have a notion of isomorphism.

Matrix algebras are very special, they have vectors in which they can act on. For an n-dimensional matrix algebra these objects are the tuples (x_1, \cdots, x_n) where x_i \in k. As a collection, they are denoted k^n. These form a vector space and as algebras M_{n}(k) \cong \text{Aut}(k^n) . Where the algebra operation in M_n(k)   is matrix multiplication and \text{Aut}(k^n) is the algebra of vector space automorphisms k^n \longrightarrow k^n with the algebra operation being composition.

The definition of an A-module can be found elsewhere but to motivate its origins, it is precisely the articulation of assigning some ‘vectors’ in some vector space V to a general k-algebra A which A can act on V similarly to matrices on tuples. If you struggle to remember the definition of an A-module just think like this and you should be able to reconstruct it.

Modules can even be defined for a ring R, with the relaxation of the structure of a vector space. And so we can assign ‘vectors’ to rings by looking at R-modules. So we can ask the question, what are \mathbb{Z}-modules? These are in fact abelian groups (why?) and so the collection of possible ‘vectors’ for \mathbb{Z} can be thought of as just abelian groups.

I think that’s pretty nice.

It’s all local for topological vector spaces

For a vector space V and a topology \tau on the space, this pair is called a topological vector space when \{v\} is closed for all v \in V and both vector space operations are continuous. This being addition and scalar multiplication.

Defining maps T_\alpha : V \longrightarrow V by T_\alpha(v) = \alpha + v where \alpha \in V it’s not hard to show that T_\alpha is in fact a homeomorphism. This gives us that the topology is uniquely defined local, around some point say 0 \in V because E \subset V is open iff \alpha + E is open for all \alpha \in V.

I think this is pretty neat and possibly the atypical situation where things locally define everything globally thus illustrating a guiding philosophy in modern mathematics.

Cauchy’s inequality and the appeal to symmetry

Working over the reals \mathbb{R} a young mathematician will learn the inequality (a_1b_1 + \cdots +  a_nb_n)^2 \leq (a_1^2 + \cdots + a_n^2)(b_1^2 + \cdots + b_n^2) which has its name associated to Cauchy and Schwarz and is called the Cauchy -Schwarz inequality.

After proving this result the question of when this inequality is an equality is brought up and the young mathematician will learn this occus when a_i = \lambda b_i for some \lambda \in \mathbb{R} and for all i \in \{1,\cdots,n \} . Here is a neat niave proof of this result following the mathematicians’ philosophy of ‘look for symmetry’.

Let us look at the ‘error’ E_n = (a_1^2 + \cdots + a_n^2)(b_1^2 + \cdots + b_n^2) - (a_1b_1 + \cdots + a_nb_n)^2.

Expanding out gives us E_n = \sum_{}^{}a_i^2b_j^2 - \sum_{}^{}a_ib_ia_jb_j

And since \sum_{}^{}a_i^2b_j^2 = \frac{1}{2}\sum_{}^{}(a_i^2b_j^2 + a_j^2b_j^2). Yes I did just write this and no, I’m not insulting your inteligence. This is my apeal to symmetry.

We get E_n = \frac{1}{2}\sum_{}^{}(a_i^2b_j^2 + a_j^2b_j^2) - \sum_{}^{}a_ib_ia_jb_j = \frac{1}{2}\sum_{}^{}(a_i^2b_j^2 + a_j^2b_j^2 - 2a_ib_ia_jb_j )

And so E_n = \frac{1}{2}\sum_{}^{}(a_ib_j - a_jb_i)^2. I’ll let you finish the rest off but we’ve done all the hard work here.

A theorem by Cauchy on ‘thinning’ sequences

Just to clarify all the numbers in this post are going to exist in \mathbb{R} . Given a sequence \{a_n\} where n \in \mathbb{N} we have a collection of partial sums \{s_n\} indexed by \mathbb{N} and defined by s_n = a_1 + a_2 + ... + a_n . If the sequence \{s_n\} converges to s (In the usual \epsilon \delta way) we say the series converges and write \sum_{n = 1}^{\infty} a_n =s. For completness if the sequence \{s_n\} diverges, the series is said to diverge.

If you know sequences really well, you know series really well as every theorem about sequences can be stated in terms of series (putting a_1 = s_1 and a_n = s_n - s_{n-1} for n > 1 ). In particular the monotone convergence theorem has an instantaneous counterpart for series.

Theorem: A series of non-negative real numbered terms converges if and only if the partial sums form a bounded sequence.

I’m going to omit the proof here but it is a quick application of the monotone convergence theorem to the partial sums. So why bring this up? Well if we impose that the terms in our series are decreasing monotically (which can appear in applications) we can apply the following theorem of Cauchy. What is interesting about this theorem is that a ‘thin’ subsequence of \{a_n\} determines the convergence or divergence of the series.

Theorem: Suppose a_1 \geq a_2 \geq ... \geq 0 are real numbers. Then the series \sum_{n=1}^{\infty} a_n converges if and only if the series \sum_{k=0}2^k a_{2^k} converges.

Proof: By the previous theorem it suffices to consider only the boundedness of the partial sums. Let us write s_n = a_1 + ... + a_n and t_k = 2a_2 + ... + 2^ka_{2^k} .  We will look at two cases, when n < 2^k and when n > 2^k .

For n < 2^k we have s_n \leq a_1 + (a_2 + a+3) + ... + (a_{2^k} + ... +a_{2^{k+1} - 1}) \leq a_1 + 2a_2 + ... + 2^ka_{2^k} = t_k where the first inequality followed from n < 2^k and the second inequality from the hypothesis.

When n > 2^k we have s_n \geq a_1 + a_2 + (a_3 + a_4) + ... +(a_{2^{k-1} +1} + ... a_{2^k}) \geq \frac{1}{2}a_1 + a_2 +2a_4 + ... + 2^{k-1}a_{2^k} = \frac{1}{2}t_k where the first inequality follows from n > 2^k and the second (you guessed it) follows from our hypothesis.

Bringing these together we conclude that the sequence \{s_n\} and \{t_k\} are either BOTH bounded or BOTH unbounded which completes the proof.

When I came across this I thought it was pretty astounding (hence why it has made it onto the blog) so lets see it it action. We will use it to deduce for p \in \mathbb{Z} that \sum_{n=2}^{\infty} \frac{1}{n(log \:n)^p} converges if p >1 and diverges if p \leq 1 .

The monotonicity of the logarithmic function implies \{\mbox{log } n \} increases which puts us in good position to apply our theorem. This leads us to the following which is enough as a proof.

\sum_{k=1}^{\infty}2^k\frac{1}{2^k(log \:2^k)^p} = \sum_{k=1}^{\infty} \frac{1}{k(log \:2)^p} =\frac{1}{(log \; 2)^p} \sum_{k=1}^{\infty} \frac{1}{k^p} .

p-Sylow subgroups and why they exist

Let G be a finite group and p a prime such that the order of G is |G| = p^nm where p and m are coprime. Cauchy’s Theorem for finite groups tells us that there exists x \in G whose order is p. We also know by Langrage’s Theorem for finite groups that given a subgroup H \leq G we have the order of H dividing the order of G.

Putting one and two together we can say that the ‘most simple’ subgroup of our finite group G is the subgroup \langle x \rangle or a cyclic group \mathbb{Z}_p. As our goal (sorry I didn’t tell you this earlier) is to look at the subgroups of G it seems clear that the next ‘most simple’ subgroups will have order some power of our prime, i.e p^k where 1 \leq k \leq n.

By ‘simple’ I mean in some sense we are looking for subgroups of G whose existence and size is only determined by the knowledge of knowing our prime in question p divides G.

Enough philosophy, let us define some notions. A p-group is a finite group whose order is a power of a prime p. H is a p-subgroup of a group G if it is a subgroup and a p-group (p necessarily has to divide the order of G). What is the largest p-subgroup of G? Well, Langrange gives us some constraints and motivates the definition of a p-Sylow subgroup. H is a p-Sylow subgroup if it is a subgroup of G of order p^n and p^n is the highest power of p dividing the order of G.

So I hope I’ve described some motives for looking at/for p-Sylow subgroups but can we even guarantee the existence of such a subgroup of our finite group G? Also, would it count as a surprise if I said the answer was yes?

To prove such a result we will use induction on the order of G. If the order of G is prime the result follows from Cauchy’s Theorem. Now suppose we can always find a p-Sylow subgroup in every finite group whose order is divisible by p and is strictly less than that of G.

Langrage tells us for a subgroup H \leq G we have |G| = |H|[G:H] so if H is proper and p doesnt divide [G:H] we can apply the inductive hypothesis to H and the p-Sylow subgroup we find for H will be one we are looking for G.

We rule this case out and suppose that for all proper subgroups H of G the index [G:H] is divisible by p . Let G act on G by conjugation, the action for this is g * h := g^{-1}hg where g, h \in G. Applying the orbit stabiliser theorem tells us for each orbit  G * x   we have |G * x| = [G:G_x] where x \in G and G_x is its stabiliser. Orbits partition G and unwareling the meaning of orbits in this action gives us |G| = |Z(G)| + \sum [G:G_x] where Z(G) is the abelian subgroup Z(G) = \{g \in G : hgh^{-1} =g \mbox{ for all } h \in G \} of G which is often called the center of G. I hope by now when reading these posts you have pen and paper as this is a moment when you should verify what I am claiming. By the case we ruled out we know that p divides |Z(G)| and applying Cauchy’s Theorem we can find an element x \in Z(G) of order p and as \langle x \rangle  \trianglelefteq Z(G) \trianglelefteq G we have \langle x \rangle \trianglelefteq G .

This gives us a quotient group G / \langle x \rangle and a quotient map \phi : G \longrightarrow G / \langle x \rangle and applying the inductive hypothesis to G / \langle x \rangle gives us a p-Sylow subgroup of G / \langle x \rangle whose order is p^{n-1} due to the equality |G| = |\langle x \rangle||G/ \langle x \rangle| = p|G/\langle x \rangle| from Lagrange. Looking at K = \phi^{-1}(K') as \langle x \rangle \subset K and \phi maps K  into K' we have K' \cong K/H and applying Lagrange (again) we obtain |K|= |H||K'| = p^n , as desired. We have found our p-Sylow subgroup K of G and this finishes the proof.

So we have existence but what more can we ask for and look for? How many such subgroups can we find? Also, if we have a standard p-subgroup is it always contained in a p -Sylow subgroup? How hard is it to find such subgroups?

If you don’t know the answer to these it is instructive to try to have a think about how you would go about answering these and try finding some p-Sylow subgroups for your favourite groups. As a hint conjugation is very important.

Unusual metrics on the rationals

Suppose one day you get bored of the triangle inequality. You have done so much real analysis that you cannot see anything new and interesting in |a-b| \le |a-z| + |z+b| anymore. But you want to keep doing analysis.

What can we do about this? One solution may be making this inequality more strict, and see where it takes us. For example, you may consider the following: |a-b| \le \max{\{|a-z|,|z-b|\}}. Note that this inequality does not hold under our usual absolute value, for example: 5 = |8 - 3| \not \le \max{ \{ |8-4|, |4-3| \} } = \max{ \{4,1\} } = 4. But does this tweaking of the triangle inequality add anything new? It may not look like it at first, but it is a dream come true:  a sequence is Cauchy if and only if distance of consecutive terms goes to zero! More mathematically: the sequence (a_n)_{n \in \mathbb{N}} is Cauchy \iff |a_n-a_{n+1}| \to 0 as n \to \infty

This is not true in the case of our usual absolute value. For example, the infamous sequence H_n = \sum_{k = 1}^{n} \frac{1}{k} is not Cauchy in the reals \mathbb{R}, for if it were Cauchy, it would converge since the reals numbers are complete. However, the difference between consecutive terms is |1/n| which clearly tends to zero.

Before we dive into the proof of the previous claim, we need to note something: we are talking about convergence in \mathbb{Q} however this would work in any metric space (X,d) where the triangle inequality satisfies d(a,b) \le  \max{\{d(a,z), d(z,b) \}} for any a,b,z \in X. This is because an absolute value on \mathbb{Q} induces a metric by setting d(a,b) = |a-b|.

Theorem: Let | \cdot | be an absolute value on \mathbb{Q} satisfying |a+b| \le \max{ \{ |a|,|b|\} }. Then any sequence (a_n)_{n \in \mathbb{N}} of rational numbers is Cauchy if and only if |a_n-a_{n+1}| \to 0 as n \to \infty.

Proof: ( \implies) Let \epsilon > 0. Since (a_n)_{n \in \mathbb{N}} is Cauchy there exists N \in \mathbb{N} such that for all n,m > N we have |a_n - a_m| < \epsilon. In particular, whenever m = n+1, we obtain  |a_n - a_{n+1}| < \epsilon, which means that |a_n - a_{n+1}| \to 0 as n \to \infty.

( \impliedby ) Let \epsilon > 0. There exists N \in \mathbb{N} such that |a_n - a_{n+1}| < \epsilon for any n > N.
Then, for all m \ge n > N, applying our strict triangle inequality:

\begin{aligned} |a_n - a_m| &= |a_n - a_{n+1} + a_{n+1} - \dots + a_{m-1} -a_{m}| \\  &\le \max{\{|a_n - a_{n+1}|, \dots,  |a_{m-1} -a_{m}| \} } < \epsilon \end{aligned} .

Therefore the sequence is Cauchy.   \square

This is all well and good, however, it makes us wonder whether there is any absolute value on \mathbb{Q} with this strict triangle inequality property. The answer is yes, in fact, there are infinitely many of them, but this is the topic for another post. Until then, to satisfy your curiosity I will write down one such absolute value and I will leave it up to you to check that it is indeed an absolute value:

|x|_3 = \frac{1}{3^{v(x)}}
where v(x) = n is the exponent of 3 whenever we write the rational number as x = 3^n \frac{p}{q}, where p,q are coprime to 3.

Can you suggest a different absolute value?

When is the zero polynomial the only zero function?

Like all mathematics, the first thing to do is set the landsape in which you wish to work in. Let K be a field and let K[x_1,...,x_n] be the commutative polynomial ring with n variables (if you are familiar with this ring I would skip the next paragraph)

What do the elements of  K[x_1,...,x_n] look like? Taking n =3 it is not hard to come up with elements, for example x_1x_3 +{x_2}^4 + {x_2}^3{x_3}^2 . One can quite easily start conjuring up polynomials for a given n but a more interesting approach to answer the question, can we find a ‘generating’ set for our polynomials? This brings one to the notion of a moninomial which is simply a product of the form x_1^{\alpha_1} ... x_n^{\alpha_n} . We can simplify the notation by starting with an n-tuple \alpha = (\alpha_1,..., \alpha_n) of non-negative integers and letting X^\alpha = x_1^{\alpha_1} ... x_n^{\alpha_n} . Clearly a poynomial f \in K[x_1,...,x_n] can be written as a finite linear combination of moninomials with the general form f = \sum_{\alpha \in A} C_\alpha X^\alpha where C_\alpha \in K and A \subset {\mathbb{N}_{0}}^n with A also being finite. Without knowing what a polynomial is aprior this is the method used to define what a polynomial is with n variables.

K[x_1,...,x_n] is a ring with addition and multiplicative defined how you would expect and as K is a field the ring is commutative. Given f \in K[x_1,...,x_n] we can think of it as an algebraic element of our ring or a function f : K^n \longrightarrow K by swapping x_i with some a_i \in K for all i . This gives us two notions of zero; the zero element of K[x_1,...,x_n] and a zero function from K^n \longrightarrow K .

To emphasise the difference let us take K = \mathbb{F}_2 , n =1 and look at the polynomials f = 0 and g = x(x-1) in K[x]. As algebraic elements clearly f is the zero element and g isn’t but as functions g(0) = 0 and g(1) = 0 so we can conclude both f and g are zero functions. To wrap up we have two distinct elements that are both zero functions K^n \longrightarrow K.

This invokes a big question and one in which we will answer: When is the zero polynomial the only zero function? (I didn’t lead you with false pretenses by the title). The answer depends on the cardinality of K .

If K is finite of size q by Lagrange’s Theorem and looking at the multiplicative group K\backslash \{0\} one can show x^q = x for all x \in K so the polynomial f = x^q- x is a zero function but is distinct from the zero element in K[x_1,...,x_n] . If n > 1 we can define f = x_1^q - x_1 . This tells us that when K is finite we can always come up with elements in K[x_1,...,x_n] that are non-zero and are a zero function K^n \longrightarrow K so there is no unique zero function.

What about the case when K is infinite? Well…

Theorem: Let K be an infinite field, let f \in K[x_1,...,x_n] . Then, f = 0 if and only if f : K^n \longrightarrow K is the zero function.

Proof of Theorem: Let us begin with the ‘easy’ direction, suppose f = 0 \in K[x_1,...,x_n] trivially we must have f(\alpha) = 0 for all \alpha \in K^n . The other direction requires us to show that if for some f \in K[x_1,...,x_n] such that f(\alpha) = 0 for all \alpha \in K^n then f is the zero element in K[x_1,...,x_n] .

We will use induction on the number of variabels n . When n = 1 a non-zero polynomial in K[x] of degree m has at most m roots. Suppose f \in K[x] such that f(\alpha) = 0 for all \alpha \in K and since K is infinite f must have infintely many roots and as the degree of f must be finite we must conclude that f = 0 .

Now assuming the inductive hypothesis for n - 1 let f \in K[x_1,...,x_n] be a polynomial such that f(\alpha) = 0 for all \alpha \in K^n . By collect the various powers of x_n we can write f in the form of f = \sum_{i}^{N} g_i(x_1,...,x_{n-1})x_n^i where N is the highest power of x_n that appears in f and g_i \in K[x_1,...,x_{n-1} ] . If we show that g_i is the zero polynomial in n - 1 variables this will force f = 0 in K[x_1,...,x_n] .

If we fix (\alpha_1,...,\alpha_{n-1}) \in K^{n-1} we get a polynomial f(\alpha_1,...,\alpha_{n-1}, x_n) \in K[x_n] and by our hypothesis this polynomial vanishes for all \alpha_n \in K . Using our above representation of f we see the coefficients of f(\alpha_1,...,\alpha_{n-1}, x_n) are g_i ( x_1,...,x_{n-1} ) and hence g_i(\alpha_1,...,\alpha_{n-1}) = 0 for all i . As (\alpha_1,...,\alpha_{n-1}) \in K^{n-1} is chosen arbitraty g_i is the zero function K^{n-1} \longrightarrow K and so by the inductive hypothesis g_i is the zero polynomial in K[x_1,..., x_n] .

Bringing in both of these results lets us conclude that if a polynomial zero function K^n \longrightarrow K is the zero element of K[x_1,...,x_n] then we must have that the field K is infinite. I find this quite a remarkable result in the fact that knowledge about a zero function gives you an indication on the cardinality of K .

The theorem we have just proved also lets us conclude that when K is infinite polynomials are unique functions or more precisely for f ,g \in K[x_1,...,x_n] f = g in K[x_1,...,x_n] if and only if f: K^n \longrightarrow K and g : K^n \longrightarrow K are the same functions. I leave the proof of this for the reader and as a hint consider the polynomial f - g.

 

An upper bound for roots of polynomials

A natural question to ask when given a polynomial is: how do the roots and the coefficients interplay with one another? In fact a huge portion of polynomial theory and solving algebraic equations is finding out the details of this relationship. The quadratic formula is a prime example – It gives a way of finding roots by only using the values of the coefficients. An explicit relationship, an algebraic formula, is only possible for polynomials of degree less than 5 which is a famous result pioneered by the work of Galois.

The interplay we are going to determine for polynomials of any degree is that you only need to look for roots in a neighborhood around 0 with the neighborhood’s size depending only on the size of the polynomial’s coefficients.

Let p \in \mathbb{R}[x] be a polynomial over \mathbb{R} of degree n with the representation p(x) = a_nx^n +... a_1x + a_0 where a_n \neq 0 and that |a_i| \leq A for all i = 1,..., a and some A \in \mathbb{R} . We will be considering our polynomial as a function p : \mathbb{C} \longrightarrow \mathbb{C} and be setting M \geq 0 and R = 1 + \frac{A + M}{|a_n|} where our choice of R will be more apparent later.

What we are going to show is that for x of a large enough magnitude our polynomial will have a magnitude as large as we like. We will also establish a simple bound on the magnitude of the roots of our polynomial p .

Suppose |x| \geq R , we have |a_n x^n| = \frac{A + M}{R- 1}|x|^n by the definition of our constants and estimating |a_{n-1}x^{n-1} + .... + a_0|  gives | a_{n-1}x^{n-1} + .... + a_0 | \leq A (|x^{n-1}| + .... + |x| + 1 ) = A\frac{|x|^n - 1}{|x| - 1 } < A \frac{|x|^n}{R-1} by using the sum of a geometric series and the triangle inequality.

Using the triangle inequality allows us to conclude |a_nx^n| - |a_{n-1}x^{n-1} + ... + a_0 | \leq |p(x)| which gives us by using the previous estimation |p(x)| \geq \frac{A+ M}{R-1} |x|^n - \frac{A}{R-1} |x|^n = \frac{M|x|^n}{R-1} \geq \frac{MR^n}{R-1} > M as both |x| \leq R and R^n > R > R - 1 as R > 1.

Bringing this together we have shown for all M \geq 0 if we look at |x| \geq R where R = 1 + \frac{A + M}{|a_n|} we have |p(x)| > M .

Looking at the contrapositive of this statement lets us conclude if |p(x)| \leq M for some x \in \mathbb{C} and M \geq 0 we must have |x| < R or equivalently x living in the ball B(0;R). If we take x to be a root of p then by setting M = 0 we have that |x| < R = 1 +\frac{A}{|a_n|} giving us a bound on the size of our root in terms of the polynomials coefficients. This means all roots of p will be found in the ball B(0; a + \frac{A}{|a_n|} ).