# User:BenFrantzDale/Linear Algebra and Functional Analysis

This is a draft of some ideas I may scatter in appropriate places around Wikipedia or may just blog about.

## Background

I think the college math curriculum for scientists and engineers could really be improved. The usual college math curriculum begins with multivariable calculus, linear algebra, and differential equations. From there curricula go off in their own directions. Since college, I have become deeply familiar with linear algebra and functional analysis. With these tools, topics including differential equations, statistics, signal processing, control systems, computer vision, Fourier transforms, and many more, have become much much more clear.

## Vectors

A vector is a mathematical construct that is generally introduced to quantify position and velocity in space. For example, a baseball's velocity at a particular time could be described by its velocity in the x, y, and z directions. This is a very useful application of vectors, but vectors can be much more than this and are indispensable in higher mathematics.

A vector space (aka a "linear space") is a collection of objects (called vectors) that, informally speaking, may be scaled and added. For example, if you throw a ball with velocity ${\displaystyle \mathbf {v} _{1}}$ from a car moving at velocity ${\displaystyle \mathbf {v} _{2}}$, the ball's velocity with respect to the ground is ${\displaystyle \mathbf {v} _{1}+\mathbf {v} _{2}}$; if you throw the ball twice as fast (with respect to you) the ball would have a velocity of ${\displaystyle \mathbf {v} _{1}+2\mathbf {v} _{2}}$ with respect to the ground. While it is common to write vectors with respect to a particular coordinate system (aka a "basis"), this is not required, and belies the simplicity of a vector. A velocity vector simply means "that way, that fast".

We can do other useful things with vectors. We can measure the length of a vector. For example, it might be useful to know how fast a ball is moving—its speed. We write this ${\displaystyle s=\|\mathbf {v} \|}$. We can also project one vector onto another vector. For example, if a baseball is flying through the air with velocity v, we might want to know how fast a ball is moving across the ground. Given a vector, g, pointing along the ground, there is an function, ${\displaystyle \mathrm {proj} (\mathbf {v} ,\mathbf {g} )}$ that tells us how fast the ball is moving in the direction of g.

### Implementation details

#### Basis

For practical applications, we need to be able to compute ${\displaystyle \mathrm {proj} (\mathbf {v} ,\mathbf {g} )}$ and ${\displaystyle \|\mathbf {v} \|}$. These are critical details, but require that we pick a representation for our vectors. Vectors can be represented in any number of ways; we will pick one representation that is very useful. We will describe a vector, v, in terms of three orthogonal vectors of length 1, x, y, and z. This is a Cartesian representation:

${\displaystyle \mathbf {v} =(v_{x},v_{y},v_{z})}$.

In this notation, the list of scalars in parenthesis makes a vector in an established basis. Since we have operations to project one vector onto another, we can also say

${\displaystyle \mathbf {v} =(v_{x},v_{y},v_{z})=(\mathrm {proj} (\mathbf {v} ,\mathbf {x} ),\mathrm {proj} (\mathbf {v} ,\mathbf {y} ),\mathrm {proj} (\mathbf {v} ,\mathbf {z} ))}$.

(Note: this only works when the basis vectors are orthogonal.) Also, since we have scalar multiplication and addition, we can write this as

${\displaystyle \mathbf {v} =(v_{x},v_{y},v_{z})=(\mathrm {proj} (\mathbf {v} ,\mathbf {x} ),\mathrm {proj} (\mathbf {v} ,\mathbf {y} ),\mathrm {proj} (\mathbf {v} ,\mathbf {z} ))=\mathrm {proj} (\mathbf {v} ,\mathbf {x} )\mathbf {x} +\mathrm {proj} (\mathbf {v} ,\mathbf {y} )\mathbf {y} +\mathrm {proj} (\mathbf {v} ,\mathbf {z} )\mathbf {z} }$.

For some problems, a good choice of basis can make the problem much easier to solve. For example, if a car is driving up a ramp, we could describe the car's position and velocity in terms of horizontal and vertical components (East–West, North–South, and up–down), but it could be easier to use a coordinate system aligned with the ramp so the car's position always looks like ${\displaystyle (p,0,0)}$. This is a simple but important idea that we will come back to.

#### Length

Using a Cartesian representation, the length of v can then be determined by the Pythagorean theorem:

${\displaystyle \|\mathbf {v} \|={\sqrt {v_{x}^{2}+v_{y}^{2}+v_{z}^{2}}}}$.

#### Projection

The projection operator, ${\displaystyle \mathrm {proj} (\mathbf {v} _{1},\mathbf {v} _{2})}$ is a bit trickier to implement [diagrams needed]. We will first define another operation called an "inner product", also known as a "dot product". In different disciplines it is written in different ways (e.g., ${\displaystyle \mathbf {u} \cdot \mathbf {v} }$, ${\displaystyle \langle \mathbf {u} ,\mathbf {v} \rangle }$, etc.), but it is easy to compute and its result means "the amount that two vectors point in the same direction times the lengths of the two vectors". This seems a bit odd at first, but is the first step toward a lot of useful results. For example, if we divide that by the length of the second vectors, we get "the amount that the first vector points in the same direction as the second times the length of the first vector", which is the projection of the first vector onto the second. It seems round-about, but it's the easiest way to compute what we want. The inner product of two vectors in Cartesian coordinates can be written as

${\displaystyle \mathbf {v} _{1}\cdot \mathbf {v} _{2}=\langle \mathbf {v} _{1},\mathbf {v} _{2}\rangle =\mathbf {v} _{1x}\mathbf {v} _{2x}+\mathbf {v} _{1y}\mathbf {v} _{2y}+\mathbf {v} _{1z}\mathbf {v} _{2z}}$.

With the dot product defined, it is easy to define projection:

${\displaystyle \mathrm {proj} (\mathbf {v} _{1},\mathbf {v} _{2})={\frac {\mathbf {v} _{1}\cdot \mathbf {v} _{2}}{\|\mathbf {v} _{2}\|}}}$.

NEED MORE EXPLANATION AND DIAGRAMS. Note that the inner product provides a concise way to compute the magnitude of a vector:

${\displaystyle \|\mathbf {v} \|={\sqrt {v_{x}^{2}+v_{y}^{2}+v_{z}^{2}}}={\sqrt {\mathbf {v} \cdot \mathbf {v} }}}$.

Keep in mind that these details are just that: details. The important thing is that we have ways to compute the length of a vector and to project one vector onto another.

## Change of basis

As mentioned above, it is often useful to think about vectors in different coordinate systems. One way to think about this operation is by figuring out how the basis in one coordinate system can be represented in the other coordinate system. For the sake of simplicity, consider a two-dimensional vector, ${\displaystyle \mathbf {v} =(v_{x},v_{y})=v_{x}\mathbf {x} +v_{y}\mathbf {y} }$. Suppose we want to represent this in a second coordinate system with basis vectors x' and y' . (FYI: these get called the "primed" coordinate system; it's just another system, we could call the vectors q and n if we wanted to, but this language is customary).

So we have

${\displaystyle \mathbf {v} =(v_{x},v_{y})}$

and we want to compute

${\displaystyle \mathbf {v'} =(v_{x'},v_{y'})}$

given x, y, x' , and y' .

We can do this by transforming the basis vectors. That is, we know that x and y are (1,0) and (0,1) in their own coordinate system, but how can we describe x and y in the primed coordinate system? We project them. And, assuming the primed system consists of unit vectors, we simply take dot products:

${\displaystyle \mathbf {x} =1\mathbf {x} +0\mathbf {y} =(\mathbf {x} \cdot \mathbf {x'} )\mathbf {x'} +(\mathbf {x} \cdot \mathbf {y'} )\mathbf {y'} }$.
${\displaystyle \mathbf {y} =0\mathbf {x} +1\mathbf {y} =(\mathbf {y} \cdot \mathbf {x'} )\mathbf {x'} +(\mathbf {y} \cdot \mathbf {y'} )\mathbf {y'} }$.

We can represent v in another coordinate system by expanding it in the xy system and then transforming that basis to the other basis:

${\displaystyle \mathbf {v} =(v_{x},v_{y})=v_{x}\mathbf {x} +v_{y}\mathbf {y} }$
${\displaystyle =v_{x}((\mathbf {x} \cdot \mathbf {x'} )\mathbf {x'} +(\mathbf {x} \cdot \mathbf {y'} )\mathbf {y'} )+v_{y}((\mathbf {y} \cdot \mathbf {x'} )\mathbf {x'} +(\mathbf {y} \cdot \mathbf {y'} )\mathbf {y'} )}$
${\displaystyle =(v_{x}(\mathbf {x} \cdot \mathbf {x'} )+v_{y}(\mathbf {y} \cdot \mathbf {x'} ))\mathbf {x'} +(v_{x}(\mathbf {x} \cdot \mathbf {y'} )+v_{y}(\mathbf {y} \cdot \mathbf {y'} )\mathbf {y'} }$

being loose with our notation TODO: Clarify vector versus scalar-valued expressions, we have

${\displaystyle v_{x'}=(v_{x},v_{y})\cdot (\mathbf {x} \cdot \mathbf {x'} ,\mathbf {y} \cdot \mathbf {x'} )}$

and

${\displaystyle v_{y'}=(v_{x},v_{y})\cdot (\mathbf {x} \cdot \mathbf {y'} ,\mathbf {y} \cdot \mathbf {y'} )}$.

The algebra gets tedious, but the it all works out. And each step has geometric meaning (diagrams needed).

## Extending vectors: Functions are vectors too

The properties described above for three-dimensional vectors can be applied to other things, including functions. (Using the tools of vectors, we can simplify all sorts of problems involving functions.)

First, note that functions have addition and scalar multiplication, just like vectors; that is, we can scale a function, ${\displaystyle f(x)}$:

${\displaystyle (\alpha f)(x)=\alpha f(x)}$,

and we can add two functions:

${\displaystyle (f+g)(x)=f(x)+g(x)}$.

DIAGRAMS. Conversely, you can think of a three-dimensional vector, represented in a chosen basis, as a function:

${\displaystyle v(x)={\begin{cases}v_{x}&0

that's like a bar graph of the components of v.

This is relatively simple, but not yet particularly useful.

### Inner product

Recall from above that the inner product is computed by adding up the products of the components,

${\displaystyle \langle u,v\rangle =u_{x}v_{x}+u_{y}v_{y}+u_{z}v_{z}}$

and if they have more components, we just keep adding:

${\displaystyle \langle u,v\rangle =\sum _{i}=1^{n}u_{i}v_{i}}$.

Now consider u and v to be functions, as described above. We can change the summation to integration and get the same answer:

${\displaystyle \langle u,v\rangle =\int _{-\infty }^{\infty }u(x)v(x)\,dx}$.

We can have bounds of infinity because we defined the vectors to have a value of zero outside of the range (0,3]. We can use the same definition of an inner product to define an inner product between functions:

${\displaystyle \langle f(x),g(x)\rangle =\int _{-\infty }^{\infty }f(x)g(x)\,dx}$.

Note that functions are often written without their argument list, e.g., ${\displaystyle f}$ rather than ${\displaystyle f(x)}$, to describe them more abstractly, just as you would write a vector as v rather than as ${\displaystyle (v_{x},v_{y},v_{z})}$.

### Norms (length)

When the ideas of vectors are extended beyond three-dimensional arrows, mathematicians like to call the "length" operation a "norm" or a "metric". The familiar norm is the "Pythagorean norm", also known as the L2 norm: the square root of the inner product of a vector with itself. Applied to a function, this is:

${\displaystyle \|f\|={\sqrt {\langle f,f\rangle }}={\sqrt {\int _{-\infty }^{\infty }f(x)f(x)\,dx}}}$.

What this means depends on the function. In electrical engineering, it might be the effective voltage of a voltage over time; in statistics it might be a measure of variability.clarify

### Projection

With an inner product and a norm, we can define projection of functions on to one another. This seems abstract at first, but is very useful because projection is how we decompose a vector into components. By decomposing functions into a sum of simpler functions, we will be able to simplify problems to make them easier to solve.

EXAMPLE HERE: DECOMPOSE POLYNOMIAL

EXAMPLE HERE: DECOMPOSE SIN INTO POLYNOMIAL

## More on inner products

At first, inner products may seem a bit obtuse. They are a step in implementing the projection operator, but other operations can also be thought of as inner products. For example, if I have a finite-dimensional vector, ${\displaystyle x=(x_{1},\dots ,x_{n})}$, the average value is

${\displaystyle {\overline {x}}={\frac {\sum _{i=1}^{n}x_{n}}{n}}={\frac {x_{1}}{n}}+\cdots +{\frac {x_{n}}{n}}=\left\langle x,\left(\overbrace {{\frac {1}{n}},\dots ,{\frac {1}{n}}} ^{\mbox{n times}}\right)\right\rangle }$.

Similarly, a weighted mean is an inner product with a particular chosen weight vector. Suppose you wanted to know Bill Gates's net worth (or at least the equities portion of it. Given a vector describing his portfolio, p, (the number of shares of stock in Microsoft, Apple, etc.) and a vector consisting of the present state of the market—the share price of every company, m, the value of all of the assets in p is simply ${\displaystyle \langle p,m\rangle }$.

## Applications thus far

Differential equations: Show 2nd-order DE.

## Core ideas [to explain]

• Vectors, linear transformations, and tensors are first-class mathematical objects that exist free of chosen basis.
• This means that reasoning about linear operations in any number of dimensions should be independant of chosen basis.
• This means that the exact numbers in a matrix aren't particularly meaningful as they depend on the basis.
• This means that operations such as determinant, trace, singular value decomposition, and others all have geometric meaning (e.g., determinant is the n-dimensional volume scale factor).
• This means a tensor, such as a stress tensor is a first-class object just like a vector. A stress tensor doesn't tell you the stress in the x, y, z, xy, xz, and yz directions.
• This also relates to information hiding in software engineering.
• Functions are vectors. Look at the definition of a vector space; note that functions have all the usual vector operations.
• Integral transforms are really just infinite-dimensional forms of a change of basis.
• There is no number that is the square root of negative one. When you see i, it's really just the two-by-two matrix that performs a ninety-degree counterclockwise rotation.
• Linearity is almost always an approximation (e.g., assuming infinitesimal deformation), but it is tremendously useful. That's why we do it.
• Sine and cosine have the property that linear combinations of the two correspond to shifting the functions. That is, you can always solve ${\displaystyle a\sin(x)+b\cos(x)=c\sin(x+\Delta )}$.
• There are at least two or three different notations for linear algebra. Nobody ever clarifies which is which.

## Results

The above observations make a lot of things simple that seemed nonsensical to me when I first saw them.

• Linear differential equations are solved by taking a linear combination of eigenfunctions. This is what you are doing when you assume your solution is made of sines and cosines and solving for the coefficients. It seemed arbitrary at the time.
• The Fourier transform is tremendously useful for signal processing because (a) as a sinusoidal basis, it makes differentiation trivial and (b) because convolution is easy (the equivalent of diagonal for matrices) in a sinusoidal basis by the convolution theorem.
• Differential equations is really a task of finding solutions to linear systems that just happen to be infinite dimensional. You can approximate the functions through a variety of methods such as finite element analysis.
• Because functions are vectors, nonlinear function optimization makes sense geometrically as hill climbing in infinite dimensions. You just pick a direction to move and a distance to go and repeat.

## To-do

1. Show matrices aren't just arrays of numbers...
2. Show matrices as representation of linear transformation...
3. Discuss covariance and contravariance in the context of how to deal with non-orthogonal coordinate systems.