4. Orthogonality

This article is a summary and a note-taking from the book 'Introduction to Linear Algebra, Gilbert Strang'.

https://math.mit.edu/~gs/linearalgebra/

4.1. Orthogonality of the Four Subspaces

https://www.semanticscholar.org/paper/The-Fundamental-Theorem-of-Linear-Algebra-Strang/9cf7601489c7c0b4ca6bdcf63d07cd6e016ec2b7/figure/0

여기 링크에 Linear Algebra의 다른 fundamental theorem도 함께 있으니 한번 봐보자.

Those four fundamental subspaces reveal what a matrix really does.

Def.

Orthogonal subspaces

Two subspaces V and W of a vector space are orthogonal if every vector v in V is perpendicular to every vector w in W.

vt w=0 for all v in V and all w in W.

Observations.

Orthogonality is impossible when dim V + dim W > dim whole space.
Zero vector is the only vector where null space meets row space, also 90 degree. And so do left null space and column space.

Proposition.

The nullspace N(A) and the row space C(At) are orthogonal subspaces of Rn.

The left nullspace N(At) and the column space C(A) are orthogonal in Rm.

Proof.

Practical
Mathematical

Orthogonal Complements.

Those fundamental subspaces are not only orthogonal, they are orthogonal complements.

Def.

The orthogonal complement of a subspace V contains every vector that is perpendicular to V. This orthogonal subspace is denoted by V perp.

있는데, orthogonal complement에서는 complement가 가지는 의미가 매우 중요하다.

Fundamental Theorem of Linear Algebra, Part 2.

N(A) is the orthogonal complement of the row space C(At) in Rn.

N(At) is the orthogonal complement of the column space C(A) in Rm.

Part 1 gave the dimension of the subspaces. Part 2 gives the 90 angles between them.

The point of “complements” is that every x can be split into a row space component xr and a nullspace component xn. When A multiplies x=xr+xn, above fig shows what happens:

The nullspace component goes to zero: Axn=0
The row space component goes to the column space: Axr=Ax.

Combining Bases from Subspaces

Some valuable facts about bases.

Any n independent vectors in Rn must span Rn. So they are a basis.
Any n vectors that span Rn must be independent. So they are a basis.

When the vectors go into the columns of an n by n square matrix, here are the same two facts.

If the n columns of A are independent, they span Rn. So Ax = b is solvable.
If the n columns span Rn, they are independent. So Ax = b has only one solution.

Proof.

Uniqueness implies existence and existence implies uniqueness.

Proposition.

Given a matrix with bases for the fundamental subspaces,

r, n-r vectors of the row space and the nullspace are independent.
r, m-r vectors of the column space and the left nullspace are independent.

Proof.

If a combination of all n vectors gives xr+xn=0, then xr=-xn is in both subspaces. So xr=xn=0, which proves independence of the n vectors together.

4.2. Projections

Two questions.

Show that projections are easy to visualize.
Projection matrices - symmetric matrices with P^2=P. The project of b is Pb.

Find the combination p = x^hat dot a = x^hat1 a1 + … x^hat n an closest to a given vector b. ⇔ We are projecting each b in R^m onto the subspace spanned by the a’s, to get p.

어떤 subspace 내에 있는 vector를 projection 하는 문제는 위 고민으로부터 왔다. 이 문제를 풀기 위한 핵심적인 아이디어는 geometry로부터 온다. 주어진 subspace와 error vector가 orthogonal 한 성질을 이용한다.

Projection Onto a Line

Projection Onto a Subspace

A^t A x^hat = A^t b

p = A x^hat = A(A^tA)^-1 A^t b = Pb

At (b-Ax^hat)=0 이므로 error vector e는 left null space (N(At))이다.

정리.

AtA is invertible if and only if A has linearly independent columns.

Proof.

We will show that AtA has the same null space as A.

The columns of A are independent -> AtA invertible

Ax=0.

A has linearly independent columns, so null space is zero vector. So does AtAx=0.

AtA invertible(has independent columns) -> The columns of A are independent.

AtAx=0.

xtAtAx=(Ax)t(Ax)=(Ax)^2=0

So N(AtA)=N(A), And if N(AtA) has independent columns, so does A.

4.3. Least Square Approximations

https://en.wikipedia.org/wiki/Least_squares

The method of least squares is a standard approach in regression analysis to approximate the solution of overdetermined systems (sets of equations in which there are more equations than unknowns) by minimizing the sum of the squares of the residuals made in the results of every single equation.

https://en.wikipedia.org/wiki/Regression_analysis

In statistical modeling, regression analysis is a set of statistical processes for estimating the relationships between a dependent variable (often called the 'outcome variable') and one or more independent variables (often called 'predictors', 'covariates', or 'features'). The most common form of regression analysis is linear regression, in which a researcher finds the line (or a more complex linear combination) that most closely fits the data according to a specific mathematical criterion.

Ax=b의 해가 없는 경우가 있다. ⇔ too many equations ⇔ overdetermined systems ⇔ e=b-Ax를 항상 0일 수는 없다.

가장 작은 e를 얻었을 때, x hat은 least square solution이다.

Here is a short unofficial way.

when Ax = b has no solution, multiply by At and solve AtAx^hat = Atb. we must connect projections to least squares.

Minimizing the Error

By geometry

https://ccrma.stanford.edu/~jos/sasp/Geometric_Interpretation_Least_Squares.html

https://www.youtube.com/watch?v=oWuhZuLOEFY

The nearest point is the projection p.

By algebra

Every vector b=p(in a column space)+e(in a nullspace of At).

Ax=b=p+e i impossible, Ax hat is solvable.

E = norm(Ax-b)^2=norm(Ax-p)^2+norm(e)^2.

By calculus

Most functions are minimized by calculus. The graph bottoms out and the derivative in every direction is zero. ⇔ continuous, multivariate function 내에서 convex 한 (or locally optima) point가 존재한다.

In least square approximation, a function is given f(x1, x2), so we can solve by calculating partial derivatives of x1, x2 is 0. ⇔ AtAx^hat = Atb.

Partial derivatives ⇔ AtAx^hat = Atb? (not proven in the book)

The Big Picture

In the figure 4.3, x was split into xr+xn. There were many solutions to Ax=b. In this section, the situation is just the opposite.There are no solutions to Ax=b.

Instead of Ax=b, we solve Ax hat =p

N(A) is a zero vector - with independent columns. AtA is invertible. The equation AtAx hat=Atb fully determines the best vector x hat.

Chapter 7 will have the complete picture - all four subspaces included. x = xr + xn, b = p + e.

Fitting a Straight Line

Fitting a line is the clearest application of least squares.

Fitting a Parabola

A system is non-linear, but we can solve an optimized least square approximation for three variables.

4.4. Orthogonal Bases and Gram-Schmidt

https://en.wikipedia.org/wiki/QR_decomposition

http://www.seas.ucla.edu/~vandenbe/133A/lectures/qr.pdf

https://m.blog.naver.com/PostList.nhn?blogId=ldj1725&categoryNo=6&logCode=0

https://web.calpoly.edu/~brichert/teaching/oldclass/f2002217/solutions/solutions11.pdf

This section has two goals.

See how orthogonality makes it easy to find x hat and p and P.
Construct orthogonal vectors.

Def.

The vectors q1, …, qn are orthonormal if

qit qj = 0 when i/=j and 1 when i=j.

A matrix with orthonormal columns is assigned to the special letter Q.

The matrix Q is easy to work with because QtQ=I.

Q is a rectangular, Qt is only an inverse from left.
Q is square, QtQ=I means that Qt=Q^-1: transpose=inverse.

Examples of orthogonal bases.

Matrices for rotation, permutation, reflection.

정리.

If Q has orthonormal columns QtQ=I, it leaves lengths unchanged:

Same length: norm(Qx)=norm(x) for every vector x.

Q also preserves dot products (Qx)t(Qy)=xtQtQy=xty.

Projections Using Orthogonal Bases: Q Replaces A.

Projection을 구하기 위한 수식에서 A가 orthonormal 하다면 AtA=QtQ=I이므로 수식은 간단해진다.

Important case: When Q is square and m=n, the subspace is the whole space. Then Qt=Q^-1 and x hat = Qtb.

The Gram-Schmidt Process

앞서 살펴본 projection, least square approximation에 AtA 연산이 들어가므로 A가 orthonormal bases matrix인 Q가 된다면 계산 상의 이점이 있다.

For this to be true, we had to say “If the vectors are orthonormal”. Now we find a way to create orthonormal vectors.

(Gram-Schmidt 수식)

Subtract from every new vector its productions in the directions already set.

The Factorization A=QR

Since the vectors a, b, c are combinations of the q’s (and vice versa), there must be a third matrix connecting A to Q. This third matrix is the triangular R in A=QR.

(R matrix 수식)

Also R=QtA.

This non-involvement of later vectors is the key point of Gram-Schmidt:

The vectors and A and q1 are all along a single line.
The vectors a, b and A, B and q1, q2 are all in the same plane.
The vectors a, b, c and A, B, C and q1, q2, q3 are in one subspace (dimension 3).

정리.

R is invertible, since diag(R)/=0.

최종적으로 A = QR 식을 통해서 least square approximation을 간단하게 구할 수 있다. 또한 책에서 소개한 modified Gram-Schmidt 는 mn^2 time에 Q, R을 구할 수 있다.

jth's blog

Friday, 6 November 2020

Introduction to Linear Algebra - 4. Orthogonality

4. Orthogonality

4.1. Orthogonality of the Four Subspaces

4.2. Projections

4.3. Least Square Approximations

4.4. Orthogonal Bases and Gram-Schmidt

No comments:

Post a Comment