Old and New: Structure theorem for finitely generated modules over PIDs, Jordan canonical forms and Cayley Hamilton's theorem.

Introduction

The structure theorem, stated for finitely generated modules over PIDs, is a nice and elementary example of the usefulness of method of representation (in this case, that of a ring). Many important results in algebra follows immediately from this theorem: the structure theorem of finite abelian groups, rational canonical forms of matrices, also Jordan canonical forms, Cayley-Hamilton theorem. We will discuss the main theorem, with enough preparation until the proof seems trivial (thus omitted and refered to [A]: Artin., Algebra.) Several mentioned consequences will also be discussed. One of the slightly technical result is refered to a different exposition [C]: Conrad., Modules over a PID.

Definition 1 Let $ {R}$ be a commutative ring with identity. A (left) $ {R}$-module $ {M}$ is an abelian group $ {(+)}$ and scalar multiplication $ {R\times M\rightarrow M}$, $ {(r,m)\mapsto rm}$ such that

$ {1m=m}$ for all $ {m\in M}$;

$ {r(m+n)=rm+rn}$;

$ {(r+s)m=rm+rs}$;

$ {\left(rs\right)m=r\left(sm\right)}$.

Note that the above rules are exactly those for a vector space. A submodule $ {N}$ of $ {M}$ is a nonempty subset that is closed under addition and scalar multipilication, hence it inherits the module structure from $ {M}$. A map $ {\varphi:M_{1}\rightarrow M_{1}}$ between two $ {R}$-modules is called a $ {R}$-module homomorphism if it is $ {R}$-linear: $ {\varphi}$ respect the group addition and scalar multiplication.

In the following we will be assuming the ring $ {R}$ to be a Principal Ideal Domain. Recall that a PID is an integral domain such that all its ideals are principal, i.e. generated by one element from the ring. The nice thing about a PID is that the property that the greatest common divisor of the elements in a set $ {B}$ can be written as a linear combination the elements in $ {B}$. This is a generalization of the property of integers.

As a non-example, we have the notion of greatest common divisor for the UFD $ {K[x,y]}$, where $ {K}$ is a field. The greatest common divisor of $ {x}$ and $ {y}$ is clearly $ {1\in K}$, however, the ideal $ {(x,y)}$ generated by $ {x}$ and $ {y}$ is not $ {(1)=K}$, whereas in $ {\mathbb{Z}}$, or any PID, we would have the two things coincide. On the other hand, in general not every element $ {r\in R}$ has a multiplicative inverse. This makes the strucutre of a module more complicated than a vector space.

The following theorem will be fundamental in understanding the structure theorem for finitely generated modules over PIDs.

Theorem 2 Let $ {\varphi:M\rightarrow N}$ be a surjective $ {R}$-module homomorphism. Then there is a canonical $ {R}$-module isomorphism $ {\tilde{\varphi}:M\big/\ker\varphi\rightarrow N}$, $ {\bar{m}\mapsto\varphi(m)}$ where $ {\bar{m}}$ is the coset $ {m+\ker\varphi}$. Moreover, there is a one-one correspondence between each submodule in $ {N}$ and each submodule in $ {M}$ that contains $ {\ker\varphi}$.

Proof: We know that $ {M\big/\ker\varphi}$ is a $ {R}$-module with scalar multiplication $ {r\bar{m}=\overline{rm}}$, and it is clear that $ {\tilde{\varphi}}$ is bijective. We check that it respects the module operations. Take $ {\bar{m},\bar{n}\in M\big/\ker\varphi}$ and $ {r\in R}$. They can be represented by $ {m}$, $ {n}$ and any other representitives of them will be differed by an element in $ {\ker\varphi}$. Now

$ \displaystyle \tilde{\varphi}(r\bar{m}+\bar{n})=\tilde{\varphi}(\bar{rm}+\bar{n})=\varphi(rm+n)=r\varphi(m)+\varphi(n). $

Moreover, any ideal $ {J\subset N}$ will correspond to some ideal $ {\varphi^{-1}(J)=\tilde{\varphi}^{-1}(J)+\ker\varphi}$ in $ {M}$. $ \Box$

The basic example of finitely generated free modules: $ {R^{n}}$

The set $ {R^{n}=R\times\cdots\times R}$ has a natural $ {R}$-module structure: the elements are $ {n}$-tuples of the elements in the ring $ {R}$, and the addtion and scalar multiplication are defined as in the case of vector space. We have an obvous set of generators, namely the finite set

$ \displaystyle \{(1,0,\dots,0),\dots,(0,\dots,1,\dots,0),\dots(0,0,\dots,1)\}, $

in the sense that taking linear combinations of these will generate $ {R^{n}}$. Moreover, they form a R-linearly independent, hence a basis, corresponding to the notions in linear algebra. In this case we say $ {R^{n}}$ is a finitely generated free module. Any finitely generated $ {R}$-module with a basis in the above sense, will admit a invaraint basis number, which will be called the rank of the module. And so the $ {R}$-module will be isomorphic to some $ {R^{n}}$. In fact, this is true for any commutative ring. Moreover, when $ {R}$ is a PID, we know everything about its submodules.

Proposition 3 Let $ {M}$ be a free $ {R}$-module. $ {R}$ is a PID if and only if any submodule of $ {M}$ is free . In the case $ {M}$ is finitely generated with rank $ {n}$, then any submodule of $ {M}$ is of rank $ {\leq n}$.

We need a slightly ajusted version of Theorem 2:

Lemma 4 Let $ {M}$, $ {N}$ be $ {R}$-modules and $ {N}$ is free, and $ {\varphi:M\rightarrow N}$ a surjective homomorphism. Then there is a free submodule $ {F}$ of $ {M}$ such that $ {\varphi(F)\cong N}$ and

$ \displaystyle M\cong F\oplus\ker\varphi. $

Proof of the proposition: We prove the case that $ {M}$ is finitely generated. Suppose $ {R}$ is a PID, and we can take $ {M=R^{n}}$. We do an induction on $ {n}$. When $ {n=1}$, $ {R}$ as a $ {R}$-module has all its submoudules being its ideals. Since $ {R}$ is a PID, it is clear that any nonzero ideal $ {I=(a)=aR}$ is generated by an element $ {a\in M\backslash\{0\}}$. In fact, the map $ {R\rightarrow aR}$ is a $ {R}$-module isomorphism. Now suppose the statement is true for some $ {n\geq1}$. Let $ {M\subset R^{n+1}}$ be a submodule, and $ {\pi:R^{n+1}\rightarrow R^{n}}$ be the projection of its first $ {n}$ coordinate. Then $ {\pi(M)\subset R^{n}}$ is a free submodule of rank $ {\leq n}$ by the induction hypothesis. We restrict the map to $ {\pi\big|_{M}:M\rightarrow\pi(M)}$. Now that $ {\pi\big|_{M}}$ is surjective, we have by Lemma 4.

$ \displaystyle M\cong\ker\pi\big|_{M}\oplus\pi(M), $

where $ {\ker\pi\big|_{M}=M\cap(0^{n}\oplus R)}$ and it is easy to see that $ {(0^{n}\oplus R)}$ is a free module of rank $ {1}$. Thus $ {\ker\pi\big|_{M}}$ is free with rank $ {\leq1}$. Hence $ {M}$ is free of rank $ {\leq n+1}$. Conversely, suppose any submodule of $ {R^{n}}$ is free. Then we have $ {R\subset R^{n}}$ as a free submodule. To show $ {R}$ is a PID, note that for $ {a\in R\backslash\{0\}}$, $ {aR}$ as a submodule of $ {R^{n}}$ is free. Hence $ {R}$ is neccessarily an integral domain. Furthermore, any ideal $ {I\subset R}$ as a submodule of $ {R^{n}}$ is free. Since $ {R}$ is commutative, the basis of $ {R}$ cannot contain more than two elements. Hence $ {R}$ is a PID. $ \Box$

Of course, a finitely generated module need not have a basis.

Example 1 $ {\mathbb{Z}\big/n\mathbb{Z}}$ where $ {n\geq1}$ as a finitely generated $ {\mathbb{Z}}$-module does not have a basis. Note that $ {\{\bar{1}\}}$ is not a basis: $ {n\cdot\bar{1}=\bar{0}}$. This shows that quotients of free modules need not be free. More generally, any finite abelian group as a $ {\mathbb{Z}}$-module does not have a basis over $ {\mathbb{Z}}$.

The above example is at the other extreme: they don't have any free submodule.

Definition 5 Let $ {M}$ be a $ {R}$-module. An element $ {m\in M}$ is called a torsion element if $ {rm=0}$ for some $ {r\in R\backslash\{0\}}$. $ {M}$ is called a torsion $ {R}$-module if every element is a torsion element. $ {M}$ is said to be torsion-free if it has no torsion elements.

Proposition 6
The set of all torsion elements in the $ {R}$-module $ {M}$ is a submodule, denoted $ {E_{tor}}$.

Proof: Recall that $ {R}$ is an integral domain: if $ {x,y\in E_{tor}}$, then $ {sx=0}$ and $ {ty=0}$ for some $ {s,t\in R\backslash\{0\}}$. Then since $ {st=ts\in R\backslash\{0\}}$, we have

$ \displaystyle st(x+y)=0. $

So $ {x+y=E_{tor}}$. Other properties of a submodule are easy to verify. $ \Box$

In general there is a difference between a $ {R}$-module being free and being torsion-free. For example, $ {\mathbb{Q}}$ as a $ {\mathbb{Z}}$-module is torsion-free, but it is not free: it has no basis over $ {\mathbb{Z}}$. However, if $ {M}$ is a finitely generated $ {R}$-module which is torsion-free, then it is free: there exists an $ {R}$-module embedding $ {M\hookrightarrow R^{n}}$, and as a submodule of $ {R^{n}}$, $ {M}$ will be a free module thanks to Proposition 3. The details of the embedding can be found in [C].

The starting point to understand the structure of a finitely generated module over a PID is the following theorem:

Theorem 7 Let $ {M}$ be a finitely generated $ {R}$-module, where $ {R}$ is a PID. Then $ {M\big/M_{tor}}$ is free. There exists a free submodule $ {F}$ of $ {M}$ such that $ {M}$ is a direct sum

$ \displaystyle M=M_{tor}\oplus F. $

It is easy to prove that the module $ {M\big/M_{tor}}$ is torsion-free and finitely generated. Hence it is free. Consider now the projection to the free module $ {M\rightarrow M\big/M_{tor}}$. Its kernel is exactly $ {M_{tor}}$. Hence by Lemma 4 there exists a free module $ {F}$ such that $ {M}$ is the desired direct sum.
The work now is to understand $ {M_{tor}}$. Thus now we assume $ {M}$ is a finitely generated torsion $ {R}$-module over a PID.

By Theorem 2, $ {M\cong R^{n}\big/\ker\varphi}$. To understand $ {M}$ is to understand the quotient $ {R^{n}\big/\ker\varphi}$. We have

Theorem 8 A submodule $ {M\subset R^{n}}$ has rank $ {n}$ if and only if $ {R^{n}\big/M}$ is a torsion module.

The $ {(\Rightarrow)}$ direction of the theorem leads to the so called Smith Normal form of an $ {n\times n}$ $ {R}$-matrix. This is nothing but diagonalizing a $ {R}$-linear system of equations.

The Smith normal form

Let $ {M\subset R^{n}}$ be a submodule of rank $ {n}$. So $ {M}$ has a $ {R}$-basis $ {\{y_{1},\dots,y_{n}\}\subset R^{n}}$. Let $ {\{e_{1},\dots,e_{n}\}}$ be a $ {R}$-basis of $ {R^{n}}$. We can write $ {y_{i}=\sum_{k=1}^{n}a_{ki}e_{k}}$ for each $ {i=1,\dots,n}$, that is

$ \displaystyle \left(y_{1},\dots,y_{n}\right)=\left(e_{1},\dots,e_{n}\right)\begin{pmatrix}a_{11} & a_{12} & \cdots & a_{1n}\\ a_{21} & \ddots & & a_{2n}\\ \vdots & & \ddots & \vdots\\ a_{n1} & a_{n2} & \cdots & a_{nn} \end{pmatrix}=\left(e_{1},\dots,e_{n}\right)A. $

Now the claim is that since $ {R}$ is a PID, $ {A}$ can be always brought to diagonal form by elementary row and column opertations, or change of basis of $ {R^{n}}$ and $ {M}$. That is, there exists a $ {R}$-basis $ {\{y_{1}',\dots,y_{n}'\}\subset R^{n}}$ of $ {M}$ and a $ {R}$-basis $ {\{e_{1}',\dots,e_{n}'\}}$ of $ {R^{n}}$, such that

$ \displaystyle \left(y_{1}',\dots,y_{n}'\right)=\left(e_{1}',\dots,e_{n}'\right)\begin{pmatrix}d_{1} & 0 & \cdots & 0\\ 0 & d_{2} & & 0\\ \vdots & & \ddots & \vdots\\ 0 & 0 & \cdots & d_{n} \end{pmatrix}=\left(d_{1}\cdot e_{1}',\dots,d_{n}\cdot e_{n}'\right), $

and the order of $ {d_{i}}$'s can be arranged so that $ {d_{1}\big|d_{2}}$, $ {d_{2}\big|d_{3}}$, $ {\dots}$, $ {d_{n-1}\big|d_{n}}$. Of course, we need not constrain ourselves to the case of $ {n\times n}$ matrices (equivalently the case $ {M\subset R^{n}}$ is a rank $ {n}$ submodule). This generalizes comfortably to the following theorem. We recall that the determinant function $ {\det:M_{n}(R)\rightarrow R}$ is a $ {R}$-module homomorphism. The general linear group $ {GL(n,R)}$ consists of the $ {R}$-matrices with unitary determinants. The elementary row and column operations corresponds to the invertible matrices of the form $ {I+rE_{ij}}$ and $ {\tilde{I}_{ij}}$, called elementary matrices, where $ {E_{ij}}$ is the matrix with $ {1}$ in the $ {(i,j)}$-th entry and $ {0}$ otherwise, and $ {\tilde{I}_{ij}}$ is the matrix obtained by swapping the $ {i}$-th and $ {j}$-th column (or rows) of $ {I}$. The set of all elementary matrices generates $ {GL(n,R)}$.

Theorem 9 (The Smith Normal Form Theorem over PID) Let $ {A}$ be a $ {R}$-matrix. Then there exsit products $ {P}$ and $ {Q}$ of elementary $ {R}$-matrices of appropriate sizes, so that $ {A'=Q^{-1}AP}$ is diagonal, say

$ \displaystyle A'=\begin{pmatrix}\begin{pmatrix}d_{1}\\ & \ddots\\ & & d_{k} \end{pmatrix}\\ & 0 \end{pmatrix}, $
where $ {d_{i}'s}$ are nonzero and $ {d_{1}\big|d_{2}\big|\cdots d_{k}}$. Moreover, $ {d_{i}}$'s are unique up to multiplication of units. Here $ {k}$ is called the rank of $ {A}$, and $ {d_{i}'s}$ are called the invariant factors of $ {A}$.

This theorem is taken from [A], and slightly generalized to fit the context. Artin has proved the theorem when the ring $ {R=\mathbb{Z}}$, but this can be easily generalized to any PID. We have discussed the reason at the beginning: the principal ideal generated by the greatest common divisor of two elements $ {a,b\in R}$ in a PID coincides with the ideal generated by $ {a}$ and $ {b}$, as is the case for $ {\mathbb{Z}}$. We thus refer to the book for the proof of this theorem (and the algorithm producing the matrix).
We explain further on invariant factors.

The invariant factors

For simplicity let's focus first on the case when $ {A'=diag(d_{1},d_{2},\dots,d_{k})}$ is a square diagonal matrix with nonzero diagonal entries and $ {d_{1}\big|d_{2}\big|\cdots d_{k}}$. Let $ {D_{i}(A')}$ be the greatest common divisor of determinants of all sub $ {i\times i}$ matrices in $ {A'}$. It follows that since $ {A'}$ is diagonal, we have

$ \displaystyle D_{i}(A')=d_{1}\cdots d_{i}. $

Thus setting $ {D_{0}(A')=1}$, we have $ {d_{i}=D_{i}(A')\big/D_{i-1}(A')}$. Now let $ {A}$ be a matrix that can be brought to $ {A'}$ using elementary operations. The fact that elementary operations only change the determinants of $ {A}$, and moreover sub-determinants of $ {A}$ up to units, implies that $ {D_{i}(A)=D_{i}(A')}$ up to units. Thus one way to compute $ {d_{i}}$'s is the formula

$ \displaystyle d_{i}=D_{i}(A)\big/D_{i-1}(A). $

The argument is essentially the same when generalizing the case when $ {A'}$ is not a square matrix.

The Structure Theorem

Now that for any rank-$ {n}$ submodule $ {M}$ of $ {R^{n}}$, it has a basis of the form

$ \displaystyle \left\{ d_{1}\cdot e_{1}',\dots,d_{n}\cdot e_{n}'\right\} $

where $ {\left\{ e_{1}',\dots,e_{n}'\right\} }$ is some basis of $ {R^{n}}$. This allows us to write $ {M}$ as a direct sum

$ \displaystyle M=M_{1}\oplus M_{2}\cdots\oplus M_{n} $

where $ {M_{i}}$ is a rank-$ {1}$ module generated by $ {d_{1}e_{1}'}$, and thus it is isomorphic to $ {d_{1}R}$. Hence the projection $ {\pi:R^{n}\rightarrow M}$ gives us a torsion module

$ \displaystyle R^{n}\big/M\cong R\big/(d_{1})\oplus R\big/(d_{2})\oplus\cdots\oplus R\big/(d_{k}). $

Let $ {R}$ be a PID. A $ {R}$-module $ {M}$ is said to be cyclic of it is isomorphic to $ {R\big/(a)}$ where $ {(a)\subset R}$ is the principal ideal generated by a element $ {a\in R}$. If $ {a=0}$ then we say $ {M}$ is infinite cyclic, and $ {M\cong R}$ as $ {R}$-modules. Note that the ``infinite'' does not neccessarily mean that $ {M}$ has infinite cardinality. But note that when $ {R}$ is finite, then $ {R}$ as an integral domain has to be a field. We now have a decomposition of a torsion module into a direct sum of cyclic modules. Combining with Theorem 7, we have

Theorem 10 Let $ {M}$ be a finitely generated $ {R}$-module, where $ {R}$ is a PID. Then we have a direct sum decomposition of $ {M}$:

$ \displaystyle M\cong R\big/(d_{1})\oplus R\big/(d_{2})\oplus\cdots\oplus R\big/(d_{k})\oplus R^{s} $
where $ {d_{1}\big|d_{2}\big|\cdots d_{k}}$ and integer $ {s\geq0}$. Moreover, $ {k,s}$ are uniquely determined and $ {d_{i}}$'s are unique up to multiplication of units.

Variants of the Structure Theorem

Since elements in a PID admit unique factorization into irreducibles (or equivalently in this case, primes, since in a UFD primes and irreducibles are the same), each $ {R\big/(d_{i})}$ admits a primary decomposition. This follows from a more general result for commutative rings:

Theorem 11 (Chinese Remainder Theorem) Let R be a commutative ring with identity and ideals $ {I_{1},\dots,I_{m}}$ that are pairwisely coprime, meaning that $ {I_{i}+I_{j}=R}$ for $ {i\neq j}$. Then their product $ {I=I_{1}\cdots I_{m}}$, generated by elements of the form $ {a_{1}\cdots a_{m}}$ where each $ {a_{i}\in I_{i}}$, is equal to the intersection $ {\bigcap_{j=1}^{k}I_{j}}$. And the quotiemt ring $ {R\big/I}$ is isomorphic to the direct sum $ {\bigoplus_{j=1}^{m}R\big/I_{j}}$ via the isomorphism

$ \displaystyle \begin{array}{rcl} \varphi:R\big/I & \rightarrow & \bigoplus_{j=1}^{m}R\big/I_{j}\\ x+I & \mapsto & (x+I_{1},\dots,x+I_{m}). \end{array} $

Proof: Notice that if $ {I_{i},I_{j},I_{k}}$ are pairwisely coprime, then $ {I_{i}\cdot I_{j}}$ is coprime to $ {I_{k}}$. We first show that if $ {I}$ and $ {J}$ are coprime, then $ {I\cdot J=I\cap J}$. Hence inductively $ {I=\bigcap_{j=1}^{k}I_{j}}$. It is clear that for any two ideals $ {I,J\subset R}$, $ {I\cdot J\subset I\cap J}$. Suppose now two ideals $ {I+J=R}$. Then there exist $ {a\in I}$ and $ {b\in J}$ such that $ {a+b=1\in R}$. Take $ {x\in I\cap J}$. We have $ {x\cdot(a+b)=xa+xb=x}$. Since $ {xa\in I\cdot J}$ and $ {xb\in I\cdot J}$, we have $ {x=xa+xb\in I\cdot J}$. Therefore, $ {I\cdot J\supset I\cap J}$ and hence $ {I\cdot J=I\cap J}$. Now we show the map $ {\varphi}$ is a well-defined isomorphism. Any two elements $ {x,y\in R}$ in the same cosets of $ {R\big/I}$ is differed by an element in $ {I=\bigcap_{j=1}^{k}I_{j}}$. Hence $ {\varphi(x+I)=\varphi(y+I)}$ and $ {\varphi}$ is well-defined and it is easily checked to be a ring homomorphism. Now $ {\varphi}$ is injective since the kernel of the map $ {\bigoplus_{j=1}^{m}\pi_{j}:R\rightarrow\bigoplus_{j=1}^{m}R\big/I_{j}}$ is obviously $ {\bigcap_{j=1}^{k}I_{j}}$. It is also surjective: since $ {I_{i}}$ and $ {\hat{I}_{i}=\bigcap_{j\neq i}I_{j}}$ are coprime, i.e. there exist $ {x_{i}\in I_{i}}$ and $ {y_{i}\in\hat{I}_{i}}$ such that $ {x_{i}+y_{i}=1\in R}$, we have $ {y_{i}=1-x_{i}\not\in I_{i}}$. Hence $ {\varphi(y_{i}+I)=(0,\dots,1,\dots,0)}$, where $ {1}$ is in the $ {i}$-th position. Obviously $ {\{\varphi(y_{i}+I)\}_{i=1}^{m}}$ generates $ {\bigoplus_{j=1}^{m}R\big/I_{j}}$. $ \Box$

Now let $ {d=p_{1}^{r_{1}}\cdots p_{m}^{r_{m}}}$ be the unique factorization into distinct irreducibles. We have

$ \displaystyle R\big/(d)\cong\bigoplus_{j=1}^{m}R\big/(p_{j}^{r_{j}}). $

Doing this for each $ {d_{i}}$, we get for each $ {d_{i}}$ a set of powers of irreducibles $ {\{p_{i,j}^{r_{i,j}}\}}$, called the elementary divisors of module $ {M}$. Combining this with the structure theorem, we have

Theorem 12 (Structure Theorem: Elementary Factor Version) Let $ {M}$ be a finitely generated $ {R}$-module, where $ {R}$ is a PID. Then we have a direct sum decomposition of $ {M}$:

$ \displaystyle M\cong\left(\bigoplus_{i=1}^{k}\bigoplus_{j=1}^{m_{i}}R\big/(p_{i,j}^{r_{i,j}})\right)\oplus R^{s}. $

An interpretation of Jordan canonical form

Let $ {V}$ be a $ {n}$-dimensional vector space over a field $ {\mathbb{K}}$, and let $ {T:V\rightarrow V}$ a linear transformation. In linear algebra, the basic tools used to understand a linear transformation $ {T}$ are its eigenvalues and eigenvectors. One would ask that how $ {T}$ is different from the identity transformation? This is manifested in the equation $ {\left(T-\lambda I\right)v=0}$. By solving for the roots (assuming they exist) of the characteristic polynomial $ {\det(xI-A)}$, we can obtain the eigenvalues of $ {T}$, and thus solve for the eigenvectors. At this point we have the notion of algebraic multiplicity and geometric multiplicity associated to an eigenvalue of $ {T}$. The fact that they don't always agree is a bit annoying: consider the linear map $ {A:\mathbb{R}^{2}\rightarrow\mathbb{R}^{2}}$ with $ {A=\begin{pmatrix}1 & 1\\ 0 & 1 \end{pmatrix}}$. The characteristic polynomial is $ {(x-1)^{2}}$ and thus it has $ {1}$ as a repeated eigenvalue; but the system

$ \displaystyle \left(\begin{pmatrix}1 & 1\\ 0 & 1 \end{pmatrix}-\begin{pmatrix}1 & 0\\ 0 & 1 \end{pmatrix}\right)\begin{pmatrix}x\\ y \end{pmatrix}=\begin{pmatrix}0 & 1\\ 0 & 0 \end{pmatrix}\begin{pmatrix}x\\ y \end{pmatrix}=\begin{pmatrix}0\\ 0 \end{pmatrix} $

would only define a $ {1}$-diemensional eigenspace. One would say that the linear transformation is somehow deficient. Then we have the notion of generalized eigenvector: a vector $ {v\in V}$ is called a generalized eigenvector of rank $ {m}$ associated with the eigenvalue $ {\lambda}$ of the linear transformation $ {T}$ if

$ \displaystyle (T-\lambda I)^{m}v=0\,\,\mbox{but}\,\,(T-\lambda I)^{m-1}v\neq0. $

In the above case, since $ {\begin{pmatrix}0 & 1\\ 0 & 0 \end{pmatrix}^{2}=0}$ is nilpotent, any vector $ {v\in\mathbb{R}^{2}}$ not in the subspace $ {\{c\cdot\begin{pmatrix}1\\ 0 \end{pmatrix}:c\in\mathbb{R}\}}$ will be a generalized eigenvector of rank-$ {2}$ associated the eigenvalue $ {1}$. Let us take $ {v=\begin{pmatrix}0\\ 1 \end{pmatrix}}$. Notice that the set

$ \displaystyle \{v,\left(T-I\right)v\}=\{\begin{pmatrix}0\\ 1 \end{pmatrix},\begin{pmatrix}1\\ 0 \end{pmatrix}\} $

forms a basis for $ {\mathbb{R}^{2}}$. And we see that the rank of the generalized eigenvector coincides with the algebraic multiplicity of its associated eigenvalue, and in this case, are both equal to $ {\dim(\mathbb{R}^{2})=2}$. The matrix $ {A}$ is in fact the $ {2}$-dimensional version of a Jordan block. The theorem of Jordan canonical form says that whenever the characteristic polynomial of $ {T}$ splits over $ {\mathbb{K}}$ (meaning having $ {n}$ roots), then if we choose an appropriate ordered basis of $ {V}$, $ {T}$ can be expressed as a direct sum of Jordan blocks of various dimensions, where each block is associated with an eigenvalue with its the dimension equal to the algebraic multiplicity. One can show that the desired basis consists of $ {\{v,(T-\lambda I)v,\dots,(T-\lambda I)^{m-1}v\}_{\lambda}}$, where $ {v}$ is a generalized eigenvector associated to the eigenvalue $ {\lambda}$. The key is that there exists a subspace $ {V_{\lambda}}$ of $ {V}$ for each $ {\lambda}$ such that $ {T-\lambda I}$ restricted to $ {V_{\lambda}}$ is nilpotent. Thus the theorem of Jordan canonical form can be viewed as a result of decomposition of $ {V}$ into $ {V_{\lambda}}$'s. Once having the right view point, the result is nothing but a version of the structure theorem for a torsion module.

Now we explain the last sentence in details. Let $ {T:V\rightarrow V}$ be a linear transformation. The PID in our consideration is $ {R=\mathbb{K}[x]}$. We define the action of $ {x}$ on $ {V}$ by $ {x.v=Tv\in V}$ for all $ {v\in V}$. Thus for $ {f(x)=\sum_{i=0}^{n}a_{i}x^{i}\in\mathbb{K}[x]}$, we have

$ \displaystyle f.v=\sum_{i=0}^{n}a_{i}\left(T^{i}\right)v, $

where we have used the convention $ {x^{0}=1}$. This gives $ {V}$ a $ {\mathbb{K}[x]}$-module structure. Note that $ {V}$ is by assumption $ {n}$-dimensional. So the set $ {\{v,Tv,\dots,T^{n}v\}}$ cannot be linearly independent for any $ {v\in V}$. Consequently every element of $ {V}$ is a torsion element. Hence $ {V}$ is a finitely generated torsion $ {\mathbb{K}[x]}$-module. Let $ {\{v_{1},\dots,v_{n}\}}$ be a basis of $ {V}$. The structure theorem leads us to consider the $ {\mathbb{K}[x]}$-module homomorphism

$ \displaystyle \begin{array}{rcl} \varphi:\mathbb{K}[x]^{n} & \rightarrow & V\\ (f_{1},\dots,f_{n}) & \mapsto & f_{1}.v_{1}+\cdots+f_{n}.v_{n} \end{array} $

which can be thought of as an extension of the isomorphism $ {\mathbb{K}^{n}\cong V}$ as vector spaces. Keep the notations as above. Let $ {A\in M_{n}(\mathbb{K})}$ be the $ {n\times n}$ matrix be the representation of $ {T}$, i.e.

$ \displaystyle \begin{pmatrix}Tv_{1}\\ \vdots\\ Tv_{n} \end{pmatrix}=A\begin{pmatrix}v_{1}\\ \vdots\\ v_{n} \end{pmatrix}. $

Let $ {\chi_{A}(x)=xI_{n}-A}$ be the characteristic matrix of $ {A}$. Then the rank-$ {n}$ submodule $ {\ker\left(\varphi:\mathbb{K}[x]^{n}\rightarrow V\right)}$ of $ {\mathbb{K}[x]^{n}}$ is generated by the rows of $ {\chi_{A}(x)}$. Now we may proceed to diagonalize the matrix $ {\chi_{A}(x)}$ by elementary row and column operations, which corresponds to choosing appropriate bases for $ {\ker\varphi}$ and $ {\mathbb{K}[x]^{n}}$ respectively. It is clear that $ {\det(\chi_{A}(x))=char_{A}(x)}$ the characteristic polynomial of $ {A}$, or $ {T}$, since it is invariant under change of basis. The diagonalization of $ {\chi_{A}(x)}$ will give rise to the invariant factors $ {d_{1}\big|d_{2}\big|\cdots\big|d_{n}\in\mathbb{K}[x]}$. Thus we have

$ \displaystyle char(A)=\prod_{i=1}^{n}d_{i}. $

Since the invariant factors are periods of the cyclic modules, it follows immediately that

$ \displaystyle d_{n}(T)=0. $

It is not hard to show that $ {d_{n}(x)}$ is in fact the minimal polynomial of $ {T}$. This implies immediately the classical result of Cayley-Hamilton.

Corollary 13 (Cayley-Hamilton)

$ \displaystyle char_{A}(A)=0. $

Assume now $ {char_{A}(x)}$ splits over $ {\mathbb{K}}$. Thus each invariant factor is in the form $ {d_{i}=(x-\lambda_{i})^{m_{i}}}$. We leave it to the reader (as if there were) to verify that the decomposition of $ {V}$ into $ {V_{\lambda_{i}}}$'s corresponds to the elementary divisor version of the structure theorem for finitely generated torsion modules.

Note: [Update 23/Sep/2016]:
This set of notes was written when I was taking Prof. Lu's undergraduate algebra course. I post it to explore the Mathjax function on Blogger which is a little bit different from Latex. ~~For instance, the theorem-like environment is not directly available on Blogger, so one needs to do something with the CSS settings.~~ (This approach has issues with browsers, and also tedious in keeping the numbering, thus abandoned. The article is converted into html for better coherence.)

Old and New

Pages

Structure theorem for finitely generated modules over PIDs, Jordan canonical forms and Cayley Hamilton's theorem.