1268 字
6 分鐘
Linear Algebra Chapter 6: Orthogonality

課程總覽

6.1 Geometry of Vectors#

6.1.1 Distances#

DEF. norm

v=v12+v22++vk2\lVert \mathbf{v} \rVert = \sqrt{ {v_{1}}^{2} + {v_{2}}^{2} + \dots + {v_{k}}^{2} }

DEF. distance

distance of u,v=uv\text{distance of } \mathbf{u}, \mathbf{v} = \lVert \mathbf{u} - \mathbf{v} \rVert

6.1.2 Dot Product#

DEF. dot product

uv=u1v1+u2v2++ukvk\mathbf{u} \cdot \mathbf{v} = u_{1} v_{1} + u_{2} v_{2} + \dots + u_{k} v_{k}

Matrix represntation of dot product

uv=uTv\mathbf{u} \cdot \mathbf{v} = \mathbf{u}^{T} \mathbf{v}

THR. basic properties of dot product operation of vectors

For all vectors u\mathbf{u} and v\mathbf{v}, and w\mathbf{w} in Rn\mathbb{R}^{n} and every scalar cc,

  1. uu=u2\mathbf{u} \cdot \mathbf{u} = \lVert \mathbf{u} \rVert^{2}
  2. uu=0\mathbf{u}\cdot \mathbf{u}=0 if and only if u=0\mathbf{u}=\mathbf{0}
  3. uv=vu\mathbf{u}\cdot \mathbf{v}=\mathbf{v}\cdot \mathbf{u}
  4. u(v+w)=uv+uw\mathbf{u}\cdot(\mathbf{v} + \mathbf{w}) =\mathbf{u}\cdot \mathbf{v}+\mathbf{u}\cdot \mathbf{w}
  5. (v+w)u=vu+wu(\mathbf{v}+\mathbf{w})\cdot \mathbf{u}=\mathbf{v}\cdot \mathbf{u}+\mathbf{w}\cdot \mathbf{u}
  6. (cu)v=c(uv)=u(cv)(c\mathbf{u})\cdot \mathbf{v}=c(\mathbf{u}\cdot \mathbf{v})=\mathbf{u}\cdot(c\mathbf{v})
  7. cu=c u\lVert c\mathbf{u} \rVert=\left| c \right|\ \lVert \mathbf{u} \rVert

Note: (3) shows commutative property of dot product; (4), (5) together show that dot product distributes over addition.

THR. Pythagorean Theorem in Rn\mathbb{R}^{n}

Let u\mathbf{u} and v\mathbf{v} be vectors in Rn\mathbb{R}^{n}. Then u\mathbf{u} and v\mathbf{v} are orthogonal if and only if

u+v2=u2+v2\lVert \mathbf{u} + \mathbf{v} \rVert ^{2} = \lVert \mathbf{u} \rVert ^{2} + \lVert \mathbf{v} \rVert ^{2}

Orthogonal Projection of a Vector on a Line#

DEF. orthogonal projection

The orthogonal projection from P(u)P(\mathbf{u}) to L(v)\mathcal{L}(\mathbf{v}) is given by

w=(uvv2)v\mathbf{w} = \left( \frac{\mathbf{u} \cdot \mathbf{v}}{\lVert \mathbf{v} \rVert ^{2}} \right)\mathbf{v}

DEF. distance from a point to a line

distance from P(u) to L(v)=uw\text{distance from } P(\mathbf{u}) \text{ to } \mathcal{L} (\mathbf{v}) = \lVert \mathbf{u} - \mathbf{w} \rVert

6.1.4 Cauchy-Schwarz and Triangle Inequalities#

THR. Cauchy-Schwarz Inequality

For any vectors u\mathbf{u} and v\mathbf{v} in Rn\mathbb{R}^{n},

uvuv.\left| \mathbf{u} \cdot \mathbf{v} \right| \le \lVert \mathbf{u} \rVert \lVert \mathbf{v} \rVert .

6.2 Orthogonal Vectors#

DEF. orthogonal set

  • every pair of distinct vectors in the set is orthogonal

DEF. orthonormal set

  • the set is an orthogonal one consisting entirely of unit vectors

THR.

  • Any orthogonal set of nonzero vectors is linearly independent.

DEF. orthogonal basis

  • an orthogonal set that is also a basis for a subspace of Rn\mathbb{R}^{n}

is called ~ for the subspace.

DEF. orthonormal basis

  • an orthonormal set that is also a basis

THR. representation of a vector in terms of an orthogonal (or orthonormal) basis

  • Let {v1,v2,,vk}\{ \mathbf{v}_{1}, \mathbf{v}_{2}, \dots, \mathbf{v}_{k} \} be an orthogonal basis for a subspace VV of Rn\mathbb{R}^{n}.
  • Let u\mathbf{u} be a vector in VV.

Then

u=uv1v12v1+uv2v22v2++uvkvk2vk\mathbf{u} = \frac{\mathbf{u} \cdot \mathbf{v}_{1}}{\lVert \mathbf{v}_{1} \rVert ^{2}} \mathbf{v}_{1} + \frac{\mathbf{u}\cdot \mathbf{v}_{2}}{\lVert \mathbf{v}_{2} \rVert ^{2}} \mathbf{v}_{2} + \dots + \frac{\mathbf{u}\cdot \mathbf{v}_{k}}{\lVert \mathbf{v}_{k} \rVert ^{2}}\mathbf{v}_{k}

Furthermore, if the orthogonal basis is an orthonormal basis for VV, then

u=(uv1)v1+(uv2)v2++(uvk)vk\mathbf{u} = (\mathbf{u} \cdot \mathbf{v}_{1}) \mathbf{v}_{1} + (\mathbf{u} \cdot \mathbf{v}_{2}) \mathbf{v}_{2} + \dots + (\mathbf{u} \cdot \mathbf{v}_{k}) \mathbf{v}_{k}

THR.

Every subspace of Rn\mathbb{R}^{n} has an orthogonal, and hence an orthogonal, basis.


DEF. The Gram-Schmidt Process

The process converts a basis to the orthogonal basis for the same subspace of Rn\mathbb{R}^{n}.

  • input: basis S={u1,u2,,uk}S=\{ \mathbf{u}_{1},\mathbf{u}_{2},\dots,\mathbf{u}_{k} \}
  • output: orthogonal basis S={v1,v2,,vk}S'=\{ \mathbf{v}_{1} , \mathbf{v}_{2},\dots,\mathbf{v}_{k}\}
v1=u1v2=u2u2v1v12v1v3=u3u3v1v12v1u3v2v22v2vk=ukukv1v12v1ukv2v22v2ukvk1vk12vk1\begin{aligned} & \mathbf{v}_{1} = \mathbf{u}_{1}\\ & \mathbf{v}_{2} = \mathbf{u}_{2} - \frac{\mathbf{u}_{2} \cdot \mathbf{v}_{1}}{\lVert \mathbf{v}_{1} \rVert ^{2}} \mathbf{v_{1} }\\ & \mathbf{v}_{3} = \mathbf{u}_{3} - \frac{\mathbf{u}_{3} \cdot \mathbf{v}_{1}}{\lVert \mathbf{v}_{1} \rVert ^{2}} \mathbf{v}_{1} - \frac{\mathbf{u}_{3} \cdot \mathbf{v}_{2}}{\lVert \mathbf{v}_{2} \rVert ^{2}} \mathbf{v}_{2}\\ & \vdots\\ & \mathbf{v}_{k} = \mathbf{u}_{k} - \frac{\mathbf{u}_{k} \cdot \mathbf{v}_{1}}{\lVert \mathbf{v}_{1} \rVert ^{2}}\mathbf{v_{1}} - \frac{ \mathbf{u}_{k} \cdot \mathbf{v}_{2}}{\lVert \mathbf{v}_{2} \rVert ^{2}} \mathbf{v}_{2} - \dots - \frac{\mathbf{u}_{k} \cdot \mathbf{v}_{k - 1}}{\lVert \mathbf{v}_{k - 1} \rVert ^{2}} \mathbf{v}_{k - 1} \end{aligned}

THR.

  • Let {u1,u2,,uk}\{ \mathbf{u}_{1},\mathbf{u}_{2},\dots,\mathbf{u}_{k} \} be a basis for a subspace WW for Rn\mathbb{R}^{n}.

Then the output of the Gram-Schmidt Process {v1,v2,,vk}\{ \mathbf{v}_{1},\mathbf{v}_{2},\dots,\mathbf{v}_{k} \} is an orthogonal set of nonzero vectors such that

span{v1,v2,,vi}=span{u1,u2,,ui}\text{span}\{ \mathbf{v}_{1},\mathbf{v}_{2},\dots,\mathbf{v}_{i} \} = \text{span}\{ \mathbf{u}_{1},\mathbf{u}_{2},\dots,\mathbf{u}_{i} \}

for each 1ik1\le i\le k. So {v1,v2,,vk}\{ \mathbf{v}_{1},\mathbf{v}_{2},\dots,\mathbf{v}_{k} \} is an orthogonal basis for WW.

DEF. the QR factorization of a matrix

6.3 Orthogonal Projections#

DEF. orthogonal complement

The orthogonal complement of a nonempty subset S\mathcal{S} of Rn\mathbb{R}^{n}, denoted by S\mathcal{S}^{\perp}, is the set of all vectors in Rn\mathbb{R}^{n} that are orthogonal to every vector in S\mathcal{S}. That is,

S={vRn  vu=0 uS}\mathcal{S}^{\perp} = \{ \mathbf{v} \in \mathbb{R}^{n} \ |\ \mathbf{v} \cdot \mathbf{u} = 0 \ \forall \mathbf{u} \in \mathcal{S} \}

THR. orthogonal complement = subspace

The orthogonal complement of any nonempty subset of Rn\mathbb{R}^{n} is a subspace of Rn\mathbb{R}^{n}.

THR. complement = complement of span

For any nonempty subset S\mathcal{S} of Rn\mathbb{R}^{n}, we have

S=(span(S)).\mathcal{S} ^{\perp} = (\text{span}(\mathcal{S} ))^{\perp}.

In particular, the orthogonal complement of a basis for a subspace is the same as the orthogonal complement of the subspace.

THR. complement of row = null space

For any matrix AA, the orthogonal complement of the row space of AA is the null space of AA; that is,

row(A)=null(A).\text{row}(A)^{\perp} = \text{null}(A).
  • COR. col(A)=row(AT)=null(AT)\text{col}(A)^{\perp} =\text{row}(A^{T})^{\perp}=\text{null}(A^{T})

THR. orthogonal decomposition theorem

Let WW be a subspace of Rn\mathbb{R}^{n}. Then, for any vector u\mathbf{u} in Rn\mathbb{R}^{n}, there exist unique vectors w\mathbf{w} in WW and z\mathbf{z} in WW^{\perp} such that

u=w+z.\mathbf{u} = \mathbf{w} + \mathbf{z}.

In addition, if {v1,v2,,vk}\{ \mathbf{v}_{1}, \mathbf{v}_{2},\dots,\mathbf{v}_{k} \} is an orthonormal basis for WW, then

w=(uv1)v1+(uv2)v2++(uvk)vk.\mathbf{w} = (\mathbf{u}\cdot \mathbf{v}_{1}) \mathbf{v_{1}} + (\mathbf{u}\cdot \mathbf{v}_{2}) \mathbf{v}_{2} + \dots + (\mathbf{u}\cdot \mathbf{v}_{k})\mathbf{v}_{k}.

THR.

For any subspace WW of Rn\mathbb{R}^{n},

dim(W)+dim(W)=n.\text{dim}(W) + \text{dim}(W^{\perp}) = n.

DEF. orthogonal projection

Let WW be a subspace of Rn\mathbb{R}^n and u\mathbf{u} be a vector in Rn\mathbb{R}^n. The orthogonal projection of u\mathbf{u} on WW is the unique vector w\mathbf{w} in WW such that uw\mathbf{u} - \mathbf{w} is in WW^\perp.

Furthermore, the function UW:RnRnU_W: \mathbb{R}^n \rightarrow \mathbb{R}^n such that UW(u)U_W(\mathbf{u}) is the orthogonal projection of u\mathbf{u} on WW for every u\mathbf{u} in Rn\mathbb{R}^n is called the orthogonal projection operator on WW.

THR.

For any subspace WW of Rn\mathbb{R}^{n}, the orthogonal projection UWU_{W} is linear.

DEF. orthogonal projection matrix

The standard matrix of an orthogonal projection operator UWU_{W} on a subspace WW of Rn\mathbb{R}^{n} is called the orthogonal projection matri for WW and is denoted

PW.P_{W}.

LEM. linear independence => invertibility

Let CC be a matrix whose columns cols(C)\text{cols}(C) are linearly independent. Then

CTCC^{T}C

is invertible.

THR. a way to obtain orthogonal projection matrix

Let CC be an n×kn\times k matrix whose columns form a basis for a subspace WW of Rn\mathbb{R}^{n}. Then

PW=C(CTC)1CT.P_{W} = C(C^{T}C) ^{-1} C^{T}.

DEF.

The distance from a vector u\mathbf{u} in Rn\mathbb{R}^{n} to a subspace WW of Rn\mathbb{R}^{n} to be the distance between u\mathbf{u} and the orthogonal projection of u\mathbf{u} on WW.

THR. closest vector property

Let WW be a subspace of Rn\mathbb{R}^{n} and u\mathbf{u} be a vector in Rn\mathbb{R}^{n}. Among all vectors in WW, the vector closest to u\mathbf{u} is the orthogonal projection UW(u)U_{W}(\mathbf{u}) of u\mathbf{u} on WW.

6.4 Least-Squares Approximations and Orthogonal Projection Matrices#

Least-Squares Approximations#

v1=[111], v2=[x1x2xn], y=[y1y2yn], C=[v1 v2]\mathbf{v}_{1} = \begin{bmatrix} 1 \\ 1 \\ \vdots \\ 1 \end{bmatrix}, \ \mathbf{v}_{2} = \begin{bmatrix} x_{1} \\ x_{2} \\ \vdots \\ x_{n} \end{bmatrix}, \ \mathbf{y} = \begin{bmatrix} y_{1} \\ y_{2} \\ \vdots \\ y_{n} \end{bmatrix}, \ C = [\mathbf{v}_{1} \ \mathbf{v}_{2}]

Error sum of squares (naturally)

E=[y1(a0+a1x1]2+[y2(a0+a1x2)]2++[yn(a0+a1xn)]2E = [y_{1} - (a_{0} + a_{1} x_{1} ]^{2} + [y_{2} - (a_{0} + a_{1} x_{2})]^{2} + \dots + [y_{n} - (a_{0} + a_{1} x_{n})]^{2}

(in terms of vectors)

E=y(a0v1+a1v2)2E = \lVert \mathbf{y} - (a_{0} \mathbf{v}_{1} + a_{1} \mathbf{v}_{2}) \rVert ^{2}

Then note that

E=dist(y,a0v1+a1v2)\sqrt{ E } = \text{dist}(\mathbf{y}, a_{0} \mathbf{v}_{1} + a_{1} \mathbf{v}_{2})

So we need to make

w=a0v1+a1v2\mathbf{w} = a_{0} \mathbf{v}_{1} + a_{1} \mathbf{v}_{2}

as close to y\mathbf{y} as possible.

And note that

wW=span{v1,v2}\mathbf{w} \in W = \text{span}\{ \mathbf{v}_{1}, \mathbf{v}_{2} \}

Then the closest w\mathbf{w} to y\mathbf{y} is the orthogonal projection of y\mathbf{y} onto WW.

Note that the columns of CC form a basis for WW. Then

PWy=C(CTC1)CTy=w=a0v1+a1v2=C[a0a1]\begin{aligned} P_{W} \mathbf{y} & = C(C^{T} C^{-1}) C^{T} \mathbf{y} \\ & = \mathbf{w} = a_{0} \mathbf{v}_{1} + a_{1} \mathbf{v}_{2} = C\begin{bmatrix} a_{0} \\ a_{1} \end{bmatrix} \end{aligned}C[a0a1]=C(CTC)1CTyC\begin{bmatrix} a_{0} \\ a_{1} \end{bmatrix} = C(C^{T} C)^{-1} C^{T} \mathbf{y}

and then

CTC[a0a1]=CTC(CTC)1CTy=CTyC^{T}C \begin{bmatrix} a_{0} \\ a_{1} \end{bmatrix} = C^{T}C(C^{T}C)^{-1} C^{T}\mathbf{y} = C^{T} \mathbf{y}

(normal equations)

And since columns of CC are linearly independent, CTCC^{T}C is invertible. (LEM. linear independence => invertibility)

Then

[a0a1]=(CTC)CTy.\begin{bmatrix} a_{0} \\ a_{1} \end{bmatrix} = (C^{T} C ) C^{T} \mathbf{y}.

Inconsistent Systems of Linear Equations#

A possibly inconsistent equation

Ax=bA\mathbf{x} = \mathbf{b}

We try to obtain z\mathbf{z} where

Azb\lVert A\mathbf{z} - \mathbf{b} \rVert

is a minimum.

Define WW:

W:={Au}W : = \{ A\mathbf{u} \}

Then WW is the column space of AA.

The vector in WW that is closest to b\mathbf{b} is the orthogonal projection of b\mathbf{b} onto WW. (THR. closest vector property)

Then the vector z\mathbf{z} minimizes Azb\lVert A\mathbf{z} - \mathbf{b} \rVert if and only if z\mathbf{z} is a solution of

Ax=PWb.A\mathbf{x} = P_{W} \mathbf{b} .

Solutions of Least Norm#

Linear Algebra Chapter 6: Orthogonality
https://blade520.com/posts/linear-algebra/ch6/
作者
Blade/磯江
發佈於
2025-12-14
許可協議
CC BY-NC-SA 4.0