Jump to content

Matrix (mathematics): Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
m Reverted 1 edit by 69.67.89.169 identified as vandalism to last revision by Jakob.scholbach. (TW)
Ftbhrygvn (talk | contribs)
Add section row operations
Line 57: Line 57:
|-
|-
| Addition
| Addition
| The ''sum'' '''A'''+'''B''' of two ''m''-by-''n'' matrices '''A''' and '''B''' is calculated entrywise:
| The ''sum 'A'''+'''B''' of two ''m''-by-''n'' matrices '''A''' and '''B''' is calculated entrywise:
:('''A''' + '''B''')<sub>''i'',''j''</sub> = '''A'''<sub>''i'',''j''</sub> + '''B'''<sub>''i'',''j''</sub>, where 1 ≤ ''i'' ≤ ''m'' and 1 ≤ ''j'' ≤ ''n''.
:('''A''' + '''B''')<sub>''i'',''j''</sub> = '''A'''<sub>''i'',''j''</sub> + '''B'''<sub>''i'',''j''</sub>, where 1 ≤ ''i'' ≤ ''m'' and 1 ≤ ''j'' ≤ ''n''.
|
|
Line 84: Line 84:
|-
|-
| Scalar multiplication
| Scalar multiplication
| The ''scalar multiplication'' ''c'''''A''' of a matrix '''A''' and a number ''c'' (also called a [[scalar]] in the parlance of [[abstract algebra]]) is given by multiplying every entry of '''A''' by ''c'':
| The ''scalar multiplication c'''''A''' of a matrix '''A''' and a number ''c'' (also called a [[scalar]] in the parlance of [[abstract algebra]]) is given by multiplying every entry of '''A''' by ''c'':
:{{nowrap begin}}(''c'''''A''')<sub>''i'',''j''</sub> = ''c'' &middot; '''A'''<sub>''i'',''j''</sub>.{{nowrap end}}
:{{nowrap begin}}(''c'''''A''')<sub>''i'',''j''</sub> = ''c'' &middot; '''A'''<sub>''i'',''j''</sub>.{{nowrap end}}
| <math>2 \cdot
| <math>2 \cdot
Line 112: Line 112:
1 & 2 & 3 \\
1 & 2 & 3 \\
0 & -6 & 0
0 & -6 & 0
\end{bmatrix}^T =
\end{bmatrix}^T =

\begin{bmatrix}
\begin{bmatrix}
1 & 0 \\
1 & 0 \\
Line 123: Line 124:
Familiar properties of numbers extend to these operations of matrices: for example, addition is [[commutative]], i.e. the matrix sum does not depend on the order of the summands: '''A'''&nbsp;+&nbsp;'''B'''&nbsp;=&nbsp;'''B'''&nbsp;+&nbsp;'''A'''.<ref>{{Harvard citations|last1 = Brown|year = 1991|nb = yes|loc=Theorem I.2.6}}.</ref> The transpose is compatible with addition and scalar multiplication, as expressed by (''c'''''A''')<sup>''T''</sup> = ''c''('''A'''<sup>''T''</sup>) and ('''A'''&nbsp;+&nbsp;'''B''')<sup>''T''</sup>&nbsp;=&nbsp;'''A'''<sup>''T''</sup>&nbsp;+&nbsp;'''B'''<sup>''T''</sup>. Finally, ('''A'''<sup>''T''</sup>)<sup>''T''</sup>&nbsp;=&nbsp;'''A'''.
Familiar properties of numbers extend to these operations of matrices: for example, addition is [[commutative]], i.e. the matrix sum does not depend on the order of the summands: '''A'''&nbsp;+&nbsp;'''B'''&nbsp;=&nbsp;'''B'''&nbsp;+&nbsp;'''A'''.<ref>{{Harvard citations|last1 = Brown|year = 1991|nb = yes|loc=Theorem I.2.6}}.</ref> The transpose is compatible with addition and scalar multiplication, as expressed by (''c'''''A''')<sup>''T''</sup> = ''c''('''A'''<sup>''T''</sup>) and ('''A'''&nbsp;+&nbsp;'''B''')<sup>''T''</sup>&nbsp;=&nbsp;'''A'''<sup>''T''</sup>&nbsp;+&nbsp;'''B'''<sup>''T''</sup>. Finally, ('''A'''<sup>''T''</sup>)<sup>''T''</sup>&nbsp;=&nbsp;'''A'''.
<!--Another, much less often used notion of matrix addition is the [[direct sum (matrix)|direct sum]].-->
<!--Another, much less often used notion of matrix addition is the [[direct sum (matrix)|direct sum]].-->

===Row operations===
Row operations are ways to change matrices. There are 3 types of row operations:<br />
'''Row switching'''<br />
:Interchanging two rows of a matrix
'''Row multiplication'''<br />
:Multiplying all entries of a row by a non-zero constant
'''Row addition'''<br />
:Adding a multiple of a row to another row
Row operations are used in a number of ways including [[Matrix(mathematics)#Linear equations|solving linear equations]] and finding inverses


==Matrix multiplication, linear equations and linear transformations==
==Matrix multiplication, linear equations and linear transformations==

Revision as of 06:57, 16 July 2009

Organization of a matrix

In mathematics, a matrix (plural matrices, or less commonly matrixes) is a rectangular array of numbers, as shown at the right. Matrices consisting of only one column or row are called vectors, while higher-dimensional, e.g. three-dimensional, arrays of numbers are called tensors. Matrices can be added and subtracted entrywise, and multiplied according to a rule corresponding to composition of linear transformations. These operations satisfy the usual identities, except that matrix multiplication is not commutative: the identity AB=BA can fail. One use of matrices is to represent linear transformations, which are higher-dimensional analogs of linear functions of the form f(x) = cx, where c is a constant. Matrices can also keep track of the coefficients in a system of linear equations. For a square matrix, the determinant and inverse matrix (when it exists) govern the behavior of solutions to the corresponding system of linear equations, and eigenvalues and eigenvectors provide insight into the geometry of the associated linear transformation.

Matrices find many applications. Physics makes use of them in various domains, for example in geometrical optics and matrix mechanics. The latter also led to studying in more detail matrices with an infinite number of rows and columns. Matrices encoding distances of knot points in a graph, such as cities connected by roads, are used in graph theory, and computer graphics use matrices to encode projections of three-dimensional space onto a two-dimensional screen. Matrix calculus generalizes classical analytical notions such as derivatives of functions or exponentials to matrices. The latter is a recurring need in solving ordinary differential equations. Serialism and dodecaphonism are musical movements of the 20th century that utilize a square mathematical matrix to determine the pattern of music intervals.

Due to their widespread use, considerable effort has been made to develop efficient methods of matrix computing, particularly if the matrices are big. To this end, there are several matrix decomposition methods, which express matrices as products of other matrices with particular properties simplifying computations, both theoretically and practically. Sparse matrices, matrices consisting mostly of zeros, which occur, for example, in simulating mechanical experiments using the finite element method, often allow for more specifically tailored algorithms performing these tasks.

The close relationship of matrices with linear transformations makes the former a key notion of linear algebra. Other types of entries, such as elements in more general mathematical fields or even rings are also used.

Definition

A matrix is a rectangular arrangement of numbers.[1] For example,

alternatively denoted using parentheses instead of box brackets:

The horizontal and vertical lines in a matrix are called rows and columns, respectively. The numbers in the matrix are called its entries. To specify a matrix's size, a matrix with m rows and n columns is called an m-by-n matrix or m × n matrix, while m and n are called its dimensions. The above is a 4-by-3 matrix.

A matrix where one of the dimensions equals one is also called a vector, and may be interpreted as an element of real coordinate space. An m × 1 matrix (one column and m rows) is called a column vector and a 1 × n matrix (one row and n columns) is called a row vector. For example, the second row vector of the above matrix is

Most of this article focuses on real and complex matrices, i.e., matrices whose entries are real or complex numbers. More general types of entries are discussed below.

Notation

The entry that lies in the i-th row and the j-th column of a matrix is typically referred to as the i,j, (i,j), or (i,j)th entry of the matrix. For example, (2,3) entry of the above matrix X is 7.

Matrices are usually denoted using upper-case letters, while the corresponding lower-case letters, with two subscript indices, represent the entries. For example, the (i, j)th entry of a matrix A is most commonly written as ai,j. Alternative notations for that entry are A[i,j] or Ai,j. In addition to using upper-case letters to symbolize matrices, many authors use a special typographical style, commonly boldface upright (non-italic), to further distinguish matrices from other variables. An asterisk is commonly used to refer to all of the rows or columns in a matrix. For example, ai,∗ refers to the ith row of A, and a∗,j refers to the jth column of A. The set of all m-by-n matrices is denoted M(m, n).

A common shorthand is

A = [ai,j]i=1,...,m; j=1,...,n or more briefly A = [ai,j]m×n

to define an m × n matrix A. In this case, the entries ai,j are defined separately for all integers 1 ≤ im and 1 ≤ jn; for example the 2-by-2 matrix

is specified by A = [ij]i=1,2; j=1,2.

Some programming languages start the numbering of rows and columns at zero, in which case the entries of an m-by-n matrix are indexed by 0 ≤ im − 1 and 0 ≤ jn − 1.[2] This article will follow the enumeration starting from 1.

Basic operations

There are a number of operations that can be applied to modify matrices called matrix addition, scalar multiplication and transposition.[3] These form the basic techniques to deal with matrices.

Operation Definition Example
Addition The sum 'A'+B of two m-by-n matrices A and B is calculated entrywise:
(A + B)i,j = Ai,j + Bi,j, where 1 ≤ im and 1 ≤ jn.

Scalar multiplication The scalar multiplication cA of a matrix A and a number c (also called a scalar in the parlance of abstract algebra) is given by multiplying every entry of A by c:
(cA)i,j = c · Ai,j.
Transpose The transpose of an m-by-n matrix A is the n-by-m matrix AT (also denoted Atr or tA) formed by turning rows into columns and vice versa:
(AT)i,j = Aj,i.

Familiar properties of numbers extend to these operations of matrices: for example, addition is commutative, i.e. the matrix sum does not depend on the order of the summands: A + B = B + A.[4] The transpose is compatible with addition and scalar multiplication, as expressed by (cA)T = c(AT) and (A + B)T = AT + BT. Finally, (AT)T = A.

Row operations

Row operations are ways to change matrices. There are 3 types of row operations:
Row switching

Interchanging two rows of a matrix

Row multiplication

Multiplying all entries of a row by a non-zero constant

Row addition

Adding a multiple of a row to another row

Row operations are used in a number of ways including solving linear equations and finding inverses

Matrix multiplication, linear equations and linear transformations

Schematic depiction of the matrix product AB of two matrices A and B.

Multiplication of two matrices is defined only if the number of columns of the left matrix is the same as the number of rows of the right matrix. If A is an m-by-n matrix and B is an n-by-p matrix, then their matrix product AB is the m-by-p matrix whose entries are given by

where 1 ≤ im and 1 ≤ jp.[5] For example (the underlined entry 1 in the product is calculated as the product 1 · 1 + 0 · 1 + 2 · 0 = 1):

Matrix multiplication satisfies the rules (AB)C = A(BC) (associativity), and (A+B)C = AC+BC as well as C(A+B) = CA+CB (left and right distributivity), whenever the size of the matrices is such that the various products are defined.[6] The product AB may be defined without BA being defined, namely if A and B are m-by-n and n-by-k matrices, respectively, and mk. Even if both products are defined, they need not be equal, i.e. generally one has

ABBA,

i.e., matrix multiplication is not commutative, in marked contrast to (rational, real, or complex) numbers whose product is independent of the order of the factors. An example of two matrices not commuting with each other is:

whereas

The identity matrix In of size n is the n-by-n matrix in which all the elements on the main diagonal are equal to 1 and all other elements are equal to 0, e.g.

It is called identity matrix because multiplication with it leaves a matrix unchanged: MIn = ImM = M for any m-by-n matrix M.

Besides the ordinary matrix multiplication just described, there exist other less frequently used operations on matrices that can be considered forms of multiplication, such as the Hadamard product and the Kronecker product.[7] They arise in solving matrix equations such as the Sylvester equation.

Linear equations

A particular case of matrix multiplication is tightly linked to linear equations: if x designates a column vector (i.e. n×1-matrix) of n variables x1, x2, ..., xn, and A is an m-by-n matrix, then the matrix equation

Ax = b,

where b is some m×1-column vector, is equivalent to the system of linear equations

A1,1x1 + A1,2x2 + ... + A1,nxn = b1
...
Am,1x1 + Am,2x2 + ... + Am,nxn = bm .[8]

This way, matrices can be used to compactly write and deal with multiple linear equations, i.e. systems of linear equations.

Linear transformations

Matrices and matrix multiplication reveal their essential features when related to linear transformations, also known as linear maps. A real m-by-n matrix A gives rise to a linear transformation RnRm mapping each vector x in Rn to the (matrix) product Ax, which is a vector in Rm. Conversely, each linear transformation f: RnRm arises from a unique m-by-n matrix A: explicitly, the (i, j)-entry of A is the ith coordinate of f(ej), where ej = (0,...,0,1,0,...,0) is the unit vector with 1 in the jth position and 0 elsewhere. The matrix A is said to represent the linear map f, and A is called the transformation matrix of f.

The following table shows a number of 2-by-2 matrices with the associated linear maps of R2. The blue original is mapped to the green grid and shapes, the origin (0,0) is marked with a black point.

Vertical shear with m=1.25. Horizontal flip Squeeze mapping with r=3/2 Scaling by a factor of 3/2 Rotation by π/6 = 30°

Under the 1-to-1 correspondence between matrices and linear maps, matrix multiplication corresponds to composition of maps[9]: if a k-by-m matrix B represents another linear map g : RmRk, then the composition gf is represented by BA since

(gf)(x) = g(f(x)) = g(Ax) = B(Ax) = (BA)x.

The last equality follows from the above-mentioned associativity of matrix multiplication.

The rank of a matrix A is the maximum number of linearly independent row vectors of the matrix, which is the same as the maximum number of linearly independent column vectors.[10] Equivalently it is the dimension of the image of the linear map represented by A.[11] The rank-nullity theorem states that the dimension of the kernel of a matrix plus the rank equals the number of columns of the matrix.[12]

Square matrices

A square matrix is a matrix which has the same number of rows and columns. Due to this size restriction, all matrices can be multiplied (and added). A n-by-n matrix, also known as a square matrix of order n, A, is called invertible or non-singular if there exists a matrix B such that

AB = In.[13]

This is equivalent to BA = In.[14] Moreover, if B exists, it is unique and is called the inverse matrix of A, denoted A−1.

The entries Ai,i form the main diagonal of a matrix. The trace, tr(A) of a square matrix A is the sum of its diagonal entries. While, as mentioned above, matrix multiplication is not commutative, the trace of the product of two matrices is independent of the order of the factors: tr(AB) = tr(BA).[citation needed]

If all entries outside the main diagonal are zero, A is called a diagonal matrix. If only all entries above (below) the main diagonal are zero, A is called a lower triangular matrix (upper triangular matrix, respectively). For example, if n = 3, they look like

(diagonal), (lower) and (upper triangular matrix).

Determinant

A linear transformation on R2 given by the indicated matrix. The determinant of this matrix is −1, as the area of the green parallelogram at the right is 1, but the map reverses the orientation, since it turns the counterclockwise orientation of the vectors to a clockwise one.

The determinant det(A) or |A| of a square matrix A is a number encoding certain properties of the matrix. A matrix is invertible if and only if its determinant is nonzero. Its absolute value equals the area (in R2) or volume (in R3) of the image of the unit square (or cube), while its sign corresponds to the orientation of the corresponding linear map: the determinant is positive if and only if the orientation is preserved.

The determinant of 2-by-2 matrices is given by

the determinant of 3-by-3 matrices involves 6 terms (rule of Sarrus). The more lengthy Leibniz formula generalises these two formulae to all dimensions.[15]

The determinant of a product of matrices equals the product of the determinants: det(AB) = det(A) · det(B).[16] Adding multiples of rows or columns to other rows or columns does not change the determinant. Exchanging rows or columns alters the sign of the determinant.[17] Using these two operations, any matrix can be transformed to a lower (or upper) triangular matrix, whose determinant equals the product of the entries on the main diagonal; therefore the determinant of the original matrix can be calculated. Finally, the Laplace expansion expresses the determinant in terms of minors, i.e., determinants of smaller matrices.[18] Determinants can be used to solve linear systems using Cramer's rule, where the division of the determinants of two related square matrices equates to the value of each of the system's variables.[19]

Eigenvalues and eigenvectors

A number λ and a non-zero vector v satisfying

Av = λv

are called an eigenvalue and an eigenvector of A, respectively.[nb 1][20] The number λ is an eigenvalue of an n×n-matrix A if and only if A−λIn is not invertible, which is equivalent to

det(A−λI) = 0.[21]

The function pA(t) = det(AtI) is called the characteristic polynomial of A, its degree is n. Therefore pA(t) has at most n different roots, i.e., eigenvalues of the matrix.[22] They may be complex even if the entries of A are real. According to the Cayley-Hamilton theorem, pA(A) = 0, that is to say, the characteristic polynomial applied to the matrix itself yields the zero matrix.

Symmetry

A square matrix A that is equal to its transpose, i.e. A = AT, is a symmetric matrix; if it is equal to the negative of its transpose, i.e. A = −AT, then it is a skew-symmetric matrix. In complex matrices, symmetry is often replaced by the concept of Hermitian matrices, which satisfy A = A, where the star denotes the conjugate transpose of the matrix, i.e. the transpose of the complex conjugate of A.

By the spectral theorem, real symmetric matrices and complex Hermitian matrices have an eigenbasis; i.e., every vector is expressible as a linear combination of eigenvectors. In both cases, all eigenvalues are real.[23] This theorem can be generalized to infinite-dimensional situations related to matrices with infinitely many rows and columns, see below.

Definiteness

Matrix A; definiteness; associated quadratic form QA(x,y);
set of vectors (x,y) such that QA(x,y)=1
positive definite indefinite
1/4 x2 + y2 1/4 x2 − 1/4 y2

Ellipse

Hyperbola

A symmetric n×n-matrix is called positive definite (negative definite, indefinite, resp.), if for all nonzero vectors xRn the associated quadratic form given by

Q(x) = xTAx

takes only positive values (negative, both negative and positive values, respectively).[24] Allowing as input two different vectors instead yields the bilinear form associated to A:

BA (x, y) = xTAy.[25]

A symmetric matrix is positive definite if and only if all its eigenvalues are positive.[26] The table at the right shows two possibilities for 2-by-2 matrices.

Computational aspects

In addition to theoretical knowledge of properties of matrices and their relation to other fields, it is important for practical purposes to perform matrix calculations effectively and precisely. The domain studying these matters is called numerical linear algebra.[27] As with other numerical situations, two main aspects are the complexity of algorithms and their numerical stability. Many problems can be solved by both direct algorithms or iterative approaches. For example, finding eigenvectors can be done by finding a sequence of vectors xn converging to an eigenvector when n tends to infinity.[28]

Determining the complexity of an algorithm means finding upper bounds or estimates of how many elementary operations such as additions and multiplications of scalars are necessary to perform some algorithm, e.g. multiplication of matrices. For example, calculating the matrix product of two n-by-n matrix using the definition given above needs n3 multiplications, since for any of the n2 entries of the product, n multiplications are necessary. The Strassen algorithm outperforms this "naive" algorithm; it needs only n2.807 multiplications.[29] A refined approach also incorporates specific features of the computing devices.

In many practical situations additional information about the matrices involved is known. An important case are sparse matrices, i.e. matrices most of whose entries are zero. There are specifically adapted algorithms for, say, solving linear systems Ax = b for sparse matrices A, such as the conjugate gradient method.[30]

An algorithm is, roughly speaking, numerical stable, if little deviations (such as rounding errors) do not lead to big deviations in the result. For example, calculating the inverse of a matrix via Laplace's formula (Adj (A) denotes the adjugate matrix of A)

A−1 = Adj(A) / det(A)

may lead to significant rounding errors if the determinant of the matrix is very small. The norm of a matrix can be used to capture the conditioning of linear algebraic problems, such as computing a matrix' inverse.[31]

Although most computer languages are not designed with commands or libraries for matrices, as early as the 1970s, some engineering desktop computers such as the HP 9830 had ROM cartridges to add BASIC commands for matrices. Some computer languages such as APL were designed to manipulate matrices, and various mathematical programs can be used to aid computing with matrices.[32]

Matrix decomposition methods

There are several methods to render matrices into a more easily accessible form. They are generally referred to as matrix transformation or matrix decomposition techniques. The interest of all these decomposition techniques is that they preserve certain properties of the matrices in question, such as determinant, rank or inverse, so that these quantities can be calculated after applying the transformation, or that certain matrix operations are algorithmically easier to carry out for some types of matrices.

The LU decomposition factors matrices as a product of lower (L) and an upper triangular matrices (U).[33] Once this decomposition is calculated, linear systems can be solved more efficiently, by a simple technique called forward and back substitution. Likewise, inverses of triangular matrices are algorithmically easier to calculate. The Gaussian elimination is a similar algorithm; it transforms any matrix to row echelon form.[34] Both methods proceed by multiplying the matrix by suitable elementary matrices, which correspond to permuting rows or columns and adding multiples of one row to another row. Singular value decomposition expresses any matrix A as a product UDV, where U and V are unitary matrices and D is a diagonal matrix.

A matrix in Jordan normal form. The grey blocks are called Jordan blocks.

The eigendecomposition or diagonalization expresses A as a product VDV−1, where D is a diagonal matrix and V is a suitable invertible matrix.[35] If A can be written in this form, it is called diagonalizable. More generally, and applicable to all matrices, the Jordan decomposition transforms a matrix into Jordan normal form, that is to say matrices whose only nonzero entries are the eigenvalues λ1 to λn of A, placed on the main diagonal and possibly entries equal to one directly above the main diagonal, as shown at the right.[36] Given the eigendecomposition, the nth power of A (i.e. n-fold iterated matrix multiplication) can be calculated via

An = (VDV−1)n = VDV−1VDV−1...VDV−1 = VDnV−1

and the power of a diagonal matrix can be calculated by taking the corresponding powers of the diagonal entries, which is much easier than doing the exponentiation for A instead. This can be used to compute the matrix exponential eA, a need frequently arising in solving linear differential equations, matrix logarithms and square roots of matrices.[37] To avoid numerically ill-conditioned situations, further algorithms such as the Schur decomposition can be employed.[38]

Abstract algebraic aspects and generalizations

Matrices can be generalized in different ways. Abstract algebra uses matrices with entries in more general fields or even rings, while linear algebra codifies properties of matrices in the notion of linear maps. It is possible to consider matrices with infinitely many columns and rows. Another extension are tensors, which can be seen as higher-dimensional arrays of numbers, as opposed to vectors, which can often be realised as sequences of numbers, while matrices are rectangular or two-dimensional array of numbers.[39] Matrices, subject to certain requirements tend to form groups known as matrix groups.

Matrices with more general entries

This article focuses on matrices whose entries are real or complex numbers. However, matrices can be considered with much more general types of entries than real or complex numbers. As a first step of generalization, any field, i.e. a set where addition, subtraction, multiplication and division operations are defined and well-behaved, may be used instead of R or C, for example rational numbers or finite fields. For example, coding theory makes use of matrices over finite fields. Wherever eigenvalues are considered, the choice of the field usually matters insofar as a the characteristic polynomial, despite having real coefficients may have complex solutions. Therefore, the field is often required to be C or any algebraically closed field when such issues arise.

More generally, abstract algebra makes great use of matrices with entries in a ring R.[40] Rings are a more general notion than fields in that no division operation exists. The very same addition and multiplication operations of matrices extend to this setting, too. The set M(n, R) of all square n-by-n matrices over R is a ring called matrix ring, isomorphic to the endomorphism ring of the left R-module Rn.[41] If the ring R is commutative, i.e., its multiplication is commutative, then M(n, R) is a unitary noncommutative (unless n = 1) associative algebra over R. The determinant of square matrices can still be defined using the Leibniz formula; a matrix is invertible if and only if its determinant is invertible in R, generalising the situation over a field F, where every nonzero element is invertible.[42] Matrices over superrings are called supermatrices.[43]

Relationship to linear maps

Linear maps RnRm are equivalent to n-by-m matrices, as described above. More generally, any linear map f: VW between finite-dimensional vector spaces can be described by a matrix A = (aij), by choosing bases v1, ..., vm, and w1, ..., wn, where m and n are the dimensions of V and W, respectively, and requiring

This uniquely determines the entries of the matrix A, but the matrix depends on the choice of the bases: different choices of bases give rise to different, but similar matrices.[44] Many of the above concrete notions can be reinterpreted in this light, for example, the transpose matrix AT describes the transpose of the linear map given by A, with respect to the dual bases.[45]

Matrix groups

A group is a mathematical structure consisting of a set of objects together with a binary operation, i.e. an operation combining any two objects to a third, subject to certain requirements.[46] A group in which the objects are matrices and the group operation is matrix multiplication is called a matrix group.[nb 2][47] Since in a group every element has to be invertible, the most general matrix groups are the groups of all invertible matrices of a given size, called the general linear groups.

Any property of matrices that is preserved under matrix products and inverses can be used to define further matrix groups. For example, matrices with a given size and with a determinant of 1 form a subgroup of (i.e. a smaller group contained in) their general linear group, called a special linear group.[48] Orthogonal matrices, determined by the condition

MTM = I,

form the orthogonal group.[49] They are called orthogonal since the associated linear transformations of Rn preserve angles in the sense that the scalar product of two vectors is unchanged after applying M to them:

(Mv) · (Mw) = v · w.[50]

Every finite group is isomorphic to a matrix group, as one can see by considering the regular representation of the symmetric group.[51] General groups can be studied using matrix groups, which are comparatively well-understood, by means of representation theory.[52]

Infinite matrices

It is also possible to consider matrices with infinitely many rows and/or columns.[53] The basic operations introduced above are defined the same way in this case. Matrix multiplication, however, and all operations stemming therefrom are only meaningful when restricted to certain matrices, since the sum featuring in the above definition of the matrix product will contain an infinity of summands. An easy way to circumvent this issue is to restrict to matrices all of whose rows (or columns) contain only finitely many nonzero terms. As in the finite case (see above), where matrices describe linear maps, infinite matrices can be used to describe operators on Hilbert spaces, where convergence and continuity questions arise. However, the explicit point of view of matrices tends to obfuscate the matter,[nb 3] and the abstract and more powerful tools of functional analysis are used instead, by relating matrices to linear maps (as in the finite case above), but imposing additional convergence and continuity constraints.

Applications

There are numerous applications of matrices, both in mathematics and other sciences. Some of them merely take advantage of the compact representation of a set of numbers in a matrix. For example, in game theory and economics, the payoff matrix encodes the payoff for two players, depending on which out of a given (finite) set of alternatives the players choose.[54] Text mining and automated thesaurus compilation makes use of document-term matrices such as tf-idf in order to keep track of frequencies of certain words in several documents.[55]

Complex numbers can be represented by particular real 2-by-2 matrices via

under which addition and multiplication of complex numbers and matrices correspond to each other. For example, 2-by-2 rotation matrices represent the multiplication with some complex number of absolute value 1, as above. A similar interpretation is possible for quaternions.[56]

Early encryption techniques such as the Hill cipher also used matrices. However, due to the linear nature of matrices, these codes are comparatively easy to break.[57] Computer graphics uses matrices both to represent objects and to calculate transformations of objects using affine rotation matrices to accomplish tasks such as projecting a three-dimensional object onto a two-dimensional screen, corresponding to a theoretical camera observation.[58] Matrices over a polynomial ring are important in the study of control theory.

Chemistry makes use of matrices in various ways, particularly since the use of quantum theory to discuss molecular bonding and spectroscopy. Examples are the overlap matrix and the Fock matrix using in solving the Roothaan equations to obtain the molecular orbitals of the Hartree–Fock method.

Graph theory

An undirected graph with adjacency matrix

The adjacency matrix of a finite graph is a basic notion of graph theory.[59] It saves which vertices of the graph are connected by an edge. Matrices containing just two different values (0 and 1 meaning for example "yes" and "no") are called logical matrices. The distance (or cost) matrix contains information about distances of the edges.[60] These concepts can be applied to websites connected hyperlinks or cities connected by roads etc., in which case (unless the road network is extremely dense) the matrices tend to be sparse, i.e. contain few nonzero entries. Therefore, specifically tailored matrix algorithms can be used in network theory.

Analysis and geometry

The Hessian matrix of a differentiable function f: RnR consists of the second derivatives of f with respect to the several coordinate directions, i.e.

[61]

It encodes information about the local growth behaviour of the function: given a critical point x = (x1, ..., xn), i.e., a point where the first partial derivatives of f vanish, the function has a local minimum if the Hessian matrix is positive definite. Quadratic programming can be used to find global minima or maxima of quadratic functions closely related to the ones attached to matrices (see above).[62]

At the saddle point (x = 0, y = 0) (red) of the function f(x, y) = x2y2, the Hessian matrix is indefinite.

Another matrix frequently used in geometrical situations is the Jacobi matrix of a differentiable map f: RnRm. If f1, ..., fm denote the components of f, then the Jacobi matrix is defined as

[63]

If n > m, and if the rank of the Jacobi matrix attains its maximal value m, f is locally invertible at that point, by the implicit function theorem.[64]

Partial differential equations can be classified by considering the matrix of coefficients of the highest-order differential operators of the equation. For elliptic partial differential equations this matrix is positive definite, which has decisive influence on the set of possible solutions of the equation in question.[65]

The finite element method is an important numerical method to solve partial differential equations, widely applied in simulating complex physical systems. It attempts to approximate the solution to some equation by piecewise linear functions, where the pieces are chosen with respect to a sufficiently fine grid, which in turn can be recast as a matrix equation.[66]

Probability theory and statistics

Two different Markov chains. The chart depicts the number of particles (of a total of 1000) in state "2". Both limiting values can be determined from the transition matrices, which are given by (red) and (black).

Stochastic matrices are square matrices whose rows are probability vectors, i.e., whose entries sum up to one. Stochastic matrices are used to define Markov chains with finitely many states.[67] A row of the stochastic matrix gives the probability distribution for the next position of some particle which is currently in the state corresponding to the row. Properties of the Markov chain like absorbing states, i.e. states that any particle attains eventually, can be read off the eigenvectors of the transition matrices.[68]

Statistics also makes use of matrices in many different forms. Descriptive statistics is concerned with describing data sets, which can often be represented in matrix form, by reducing the amount of data. The covariance matrix encodes the mutual variance of several random variables.[69] Another technique using matrices are linear least squares, a method that approximates a finite set of pairs (x1, y1), (x2, y2), ..., (xN, yN), by a linear function

yiaxi + b, i = 1, ..., N

which can be formulated in terms of matrices, related to the singular value decomposition of matrices.[70]

Random matrices are matrices whose entries are random numbers, subject to suitable probability distributions, such as matrix normal distribution. Beyond probability theory, they are applied in domains ranging from number theory to physics.[71][72]

Symmetries and transformations in physics

Linear transformations and the associated symmetries play a key role in modern physics. For example, elementary particles in quantum field theory are classified as representations of the Lorentz group of special relativity and, more specifically, by their behavior under the spin group. Concrete representations involving the Pauli matrices and more general gamma matrices are an integral part of the physical description of fermions, which behave as spinors.[73] For the three lightest quarks, there is a group-theoretical representation involving the special unitary group SU(3); for their calculations, physicists use a convenient matrix representation known as the Gell-Mann matrices, which are also used for the SU(3) gauge group that forms the basis of the modern description of strong nuclear interactions, quantum chromodynamics. The Cabibbo-Kobayashi-Maskawa matrix, in turn, expresses the fact that the basic quark states that are important for weak interactions are not the same as, but linearly related to the basic quark states that define particles with specific and distinct masses.[74]

Linear combinations of quantum states

The first model of quantum mechanics (Heisenberg, 1925) represented the theory's operators by infinite-dimensional matrices acting on quantum states.[75] This is also referred to as matrix mechanics. One particular example is the density matrix that characterizes the "mixed" state of a quantum system as a linear combination of elementary, "pure" eigenstates.[76]

Another matrix serves as a key tool for describing the scattering experiments which form the cornerstone of experimental particle physics: Collision reactions such as occur in particle accelerators, where non-interacting particles head towards each other and collide in a small interaction zone, with a new set of non-interacting particles as the result, can be described as the scalar product of outgoing particle states and a linear combination of ingoing particle states. The linear combination is given by a matrix known as the S-matrix, which encodes all information about the possible interactions between particles.[77]

Normal Modes

A general application of matrices in physics is to the description of linearly coupled harmonic systems. The equations of motion of such systems can be described in matrix form, with a mass matrix multiplying a generalized velocity to give the kinetic term, and a force matrix multiplying a displacement vector to characterize the interactions. The best way to obtain solutions is to determine the system's eigenvectors, its normal modes, by diagonalizing the matrix equation. Techniques like this are crucial when it comes to describing the internal dynamics of molecules: the internal vibrations of systems consisting of mutually bound component atoms.[78] They are also needed for describing mechanical vibrations, and oscillations in electrical circuits.[79]

Geometrical optics

Geometrical optics provides further matrix applications. In this approximative theory, the wave nature of light is neglected. The result is a model in which light rays are indeed geometrical rays. If the deflection of light rays by optical elements is small, the action of a lens or reflective element on a given light ray can be expressed as multiplication of a two-component vector with a two-by-two matrix called ray transfer matrix: the vector's components are the light ray's slope and its distance from the optical axis, while the matrix encodes the properties of the optical element. The matrix characterizing an optical system consisting of a combination of lenses and/or reflective elements is simply the product of the components' matrices.[80]

History

Matrices have a long history of application in solving linear equations. The Chinese text The Nine Chapters on the Mathematical Art (Jiu Zhang Suan Shu), from between 300 BC and AD 200, is the first example of the use of matrix methods to solve simultaneous equations,[81] including the concept of determinants, almost 2000 years before its publication by the Japanese mathematician Seki in 1683 and the German mathematician Leibniz in 1693. Cramer presented Cramer's rule in 1750.

Early matrix theory emphasized determinants more strongly than matrices and an independent matrix concept akin to the modern notion emerged only in 1858, with Cayley's Memoir on the theory of matrices.[82][83] The term "matrix" was coined by Sylvester, who understood a matrix as an object giving rise to a number of determinants today called minors, that is to say, determinants of smaller matrices which derive from the original one by removing columns and rows. Etymologically, matrix derives from Latin mater (mother).[84]

The study of determinants sprang from several sources.[85] Number-theoretical problems led Gauss to relate coefficients of quadratic forms, i.e., expressions such as x2 + xy − 2y2, and linear maps in three dimensions to matrices. Eisenstein further developed these notions, including the remark that, in modern parlance, matrix products are non-commutative. Cauchy was the first to prove general statements about determinants, using as definition of the determinant of a matrix A = [ai,j] the following: replace the powers ajk by ajk in the polynomial

( denotes the product of the indicated terms.) He also showed, in 1829, that the eigenvalues of symmetric matrices are real.[86] Jacobi studied "functional determinants"—later called Jacobi determinants by Sylvester—which can be used to describe geometric transformations at a local (or infinitesimal) level, see above; Kronecker's Vorlesungen über die Theorie der Determinanten[87] and Weierstrass' Zur Determinantentheorie,[88] both published in 1903, first treated determinants axiomatically, as opposed to previous more concrete approaches such as the mentioned formula of Cauchy. At that point, determinants were firmly established.

Many theorems were first established for small matrices only, for example the Cayley-Hamilton theorem was proved for 2×2 matrices by Cayley in the aforementioned memoir, and by Hamilton for 4×4 matrices. Frobenius, working on bilinear forms, generalized the theorem to all dimensions (1898). Also at the end of the 19th century the Gauss-Jordan elimination (generalizing a special case now known as Gauss elimination) was established by Jordan. In the early 20th century, matrices attained a central role in linear algebra.[89]

The inception of matrix mechanics by Heisenberg, Born and Jordan led to studying matrices with infinitely many rows and columns.[90] Later, von Neumann carried out the mathematical formulation of quantum mechanics, by further developing functional analytic notions such as linear operators on Hilbert spaces, which, very roughly speaking, correspond to Euclidean space, but with an infinity of independent directions.

See also

Notes

  1. ^ Brown 1991, Chapter I.1. Alternative references for this book include Lang 1987b and Greub 1975.
  2. ^ Oualline 2003, Ch. 5.
  3. ^ Brown 1991, Definition I.2.1 (addition), Definition I.2.4 (scalar multiplication), and Definition I.2.33 (transpose)
  4. ^ Brown 1991, Theorem I.2.6.
  5. ^ Brown 1991, Definition I.2.20.
  6. ^ Brown 1991, Theorem I.2.24.
  7. ^ Horn & Johnson 1985, Ch. 4 and 5.
  8. ^ Brown 1991, I.2.21 and 22.
  9. ^ Greub 1975, Section III.2.
  10. ^ Brown 1991, Definition II.3.3.
  11. ^ Greub 1975, Section III.1.
  12. ^ Brown 1991, Theorem II.3.22.
  13. ^ Brown 1991, Definition I.2.28.
  14. ^ Brown 1991, Definition I.5.13.
  15. ^ Brown 1991, Definition III.2.1.
  16. ^ Brown 1991, Theorem III.2.12.
  17. ^ Brown 1991, Corollary III.2.16.
  18. ^ Mirsky 1990, Theorem 1.4.1.
  19. ^ Brown 1991, Theorem III.3.18.
  20. ^ Brown 1991, Definition III.4.1.
  21. ^ Brown 1991, Definition III.4.9.
  22. ^ Brown 1991, Corollary III.4.10.
  23. ^ Horn & Johnson 1985, Theorem 2.5.6.
  24. ^ Horn & Johnson 1985, Chapter 7.
  25. ^ Horn & Johnson 1985, Example 4.0.6, p. 169.
  26. ^ Horn & Johnson 1985, Theorem 7.2.1.
  27. ^ Bau III & Trefethen 1997.
  28. ^ Householder 1975, Ch. 7.
  29. ^ Golub & Van Loan 1996, Algorithm 1.3.1.
  30. ^ Golub & Van Loan 1996, Chapters 9 and 10, esp. section 10.2.
  31. ^ Golub & Van Loan 1996, Chapter 2.3.
  32. ^ For example, Mathematica, see Wolfram 2003, Ch. 3.7.
  33. ^ Press, Flannery & Teukolsky 1992.
  34. ^ Stoer & Bulirsch 2002, Section 4.1.
  35. ^ Horn & Johnson 1985, Theorem 2.5.4.
  36. ^ Horn & Johnson 1985, Ch. 3.1, 3.2.
  37. ^ Arnold & Cooke 1992, Sections 14.5, 7, 8.
  38. ^ Bronson 1989, Ch. 15.
  39. ^ Coburn 1955, Ch. V.
  40. ^ Lang 2002, Chapter XIII.
  41. ^ Lang 2002, XVII.1, p. 643.
  42. ^ Lang 2002, Proposition XIII.4.16.
  43. ^ Reichl 2004, Section L.2.
  44. ^ Greub 1975, Section III.3.
  45. ^ Greub 1975, Section III.3.13.
  46. ^ See any standard reference in group.
  47. ^ Baker 2003, Def. 1.30.
  48. ^ Baker 2003, Theorem 1.2.
  49. ^ Artin 1991, Chapter 4.5.
  50. ^ Artin 1991, Theorem 4.5.13.
  51. ^ Rowen 2008, Example 19.2, p. 198.
  52. ^ See any reference in representation theory or group representation.
  53. ^ See the item "Matrix" in Itõ, ed. 1987.
  54. ^ Fudenberg & Tirole 1983, Section 1.1.1.
  55. ^ Manning 1999, Section 15.3.4.
  56. ^ Ward 1997, Ch. 2.8.
  57. ^ Stinson 2005, Ch. 1.1.5 and 1.2.4.
  58. ^ Association for Computing Machinery 1979, Ch. 7.
  59. ^ Godsil & Royle 2004, Ch. 8.1.
  60. ^ Punnen 2002.
  61. ^ Lang 1987a, Ch. XVI.6.
  62. ^ Nocedal 2006, Ch. 16.
  63. ^ Lang 1987a, Ch. XVI.1.
  64. ^ Lang 1987a, Ch. XVI.5. For a more advanced, and more general statement see Lang 1969, Ch. VI.2.
  65. ^ Gilbarg & Trudinger 2001.
  66. ^ Šolin 2005, Ch. 2.5. See also stiffness method.
  67. ^ Latouche & Ramaswami 1999.
  68. ^ Mehata & Srinivasan 1978, Ch. 2.8.
  69. ^ Krzanowski 1988, Ch. 2.2., p. 60.
  70. ^ Krzanowski 1988, Ch. 4.1.
  71. ^ Conrey 2007.
  72. ^ Zabrodin, Brezin & Kazakov et al. 2006.
  73. ^ Itzykson & Zuber 1980, Ch. 2.
  74. ^ see Burgess & Moore 2007, section 1.6.3. (SU(3)), section 2.4.3.2. (Kobayashi-Maskawa matrix).
  75. ^ Schiff 1968, Ch. 6.
  76. ^ Bohm 2001, sections II.4 and II.8.
  77. ^ Weinberg 1995, Ch. 3.
  78. ^ Wherrett 1987, part II.
  79. ^ Riley, Hobson & Bence 1997, 7.17.
  80. ^ Guenther 1990, Ch. 5.
  81. ^ Shen, Crossley & Lun 1999 cited by Bretscher 2005, p. 1.
  82. ^ Cayley 1889, vol. II, p. 475–496
  83. ^ Dieudonné, ed. 1978, Vol. 1, Ch. III, p. 96
  84. ^ Merriam-Webster dictionary, retrieved April, 20th 2009 {{citation}}: Check date values in: |accessdate= (help)
  85. ^ Knobloch 1994.
  86. ^ Hawkins 1975.
  87. ^ Kronecker 1897.
  88. ^ Weierstrass 1915, pp. 271–286.
  89. ^ Bôcher 2004.
  90. ^ Mehra & Rechenberg 1987.
  1. ^ Eigen means "own" in German and in Dutch.
  2. ^ Additionally, the group is required to be closed in the general linear group.
  3. ^ "Not much of matrix theory carries over to infinite-dimensional spaces, and what does is not so useful, but it sometimes helps." Halmos 1982, p. 23, Chapter 5

References

Physics references

  • Bohm, Arno (2001), Quantum Mechanics: Foundations and Applications, Springer, ISBN 0-387-95330-2
  • Burgess, Cliff; Moore, Guy (2007), The Standard Model. A Primer, Cambridge University Press, ISBN 0-521-86036-9
  • Guenther, Robert D. (1990), Modern Optics, John Wiley, ISBN 0-471-60538-7
  • Itzykson, Claude; Zuber, Jean-Bernard (1980), Quantum Field Theory, McGraw-Hill, ISBN 0-07-032071-3
  • Riley, K. F.; Hobson, M. P.; Bence, S. J. (1997), Mathematical methods for physics and engineering, Cambridge University Press, ISBN 0-521-55506-X
  • Schiff, Leonard I. (1968), Quantum Mechanics (3rd ed.), McGraw-Hill
  • Weinberg, Steven (1995), The Quantum Theory of Fields. Volume I: Foundations, Cambridge University Press, ISBN 0-521-55001-7
  • Wherrett, Brian S. (1987), Group Theory for Atoms, Molecules and Solids, Prentice-Hall International, ISBN 0-13-365461-3
  • Zabrodin, Anton; Brezin, Édouard; Kazakov, Vladimir; Serban, Didina; Wiegmann, Paul (2006), Applications of Random Matrices in Physics (NATO Science Series II: Mathematics, Physics and Chemistry), Berlin, New York: Springer-Verlag, ISBN 978-1-4020-4530-1

Historical references

External links

Template:Wikiversity2

History
Online books
Online Matrix Calculators
  • Oehlert, Gary W.; Bingham, Christopher, MacAnova, University of Minnesota, School of Statistics, retrieved 12/10/2008 {{citation}}: Check date values in: |accessdate= (help), a freeware package for matrix algebra and statistics

Template:Link FA Template:Link FA