課程總覽
6.1 Geometry of Vectors# 6.1.1 Distances# DEF. norm
∥ v ∥ = v 1 2 + v 2 2 + ⋯ + v k 2 \lVert \mathbf{v} \rVert = \sqrt{ {v_{1}}^{2} + {v_{2}}^{2} + \dots + {v_{k}}^{2} } ∥ v ∥ = v 1 2 + v 2 2 + ⋯ + v k 2 DEF. distance
distance of u , v = ∥ u − v ∥ \text{distance of } \mathbf{u}, \mathbf{v} = \lVert \mathbf{u} - \mathbf{v} \rVert distance of u , v = ∥ u − v ∥ 6.1.2 Dot Product# DEF. dot product
u ⋅ v = u 1 v 1 + u 2 v 2 + ⋯ + u k v k \mathbf{u} \cdot \mathbf{v} = u_{1} v_{1} + u_{2} v_{2} + \dots + u_{k} v_{k} u ⋅ v = u 1 v 1 + u 2 v 2 + ⋯ + u k v k Matrix represntation of dot product
u ⋅ v = u T v \mathbf{u} \cdot \mathbf{v} = \mathbf{u}^{T} \mathbf{v} u ⋅ v = u T v THR. basic properties of dot product operation of vectors
For all vectors u \mathbf{u} u and v \mathbf{v} v , and w \mathbf{w} w in R n \mathbb{R}^{n} R n and every scalar c c c ,
u ⋅ u = ∥ u ∥ 2 \mathbf{u} \cdot \mathbf{u} = \lVert \mathbf{u} \rVert^{2} u ⋅ u = ∥ u ∥ 2
u ⋅ u = 0 \mathbf{u}\cdot \mathbf{u}=0 u ⋅ u = 0 if and only if u = 0 \mathbf{u}=\mathbf{0} u = 0
u ⋅ v = v ⋅ u \mathbf{u}\cdot \mathbf{v}=\mathbf{v}\cdot \mathbf{u} u ⋅ v = v ⋅ u
u ⋅ ( v + w ) = u ⋅ v + u ⋅ w \mathbf{u}\cdot(\mathbf{v} + \mathbf{w}) =\mathbf{u}\cdot \mathbf{v}+\mathbf{u}\cdot \mathbf{w} u ⋅ ( v + w ) = u ⋅ v + u ⋅ w
( v + w ) ⋅ u = v ⋅ u + w ⋅ u (\mathbf{v}+\mathbf{w})\cdot \mathbf{u}=\mathbf{v}\cdot \mathbf{u}+\mathbf{w}\cdot \mathbf{u} ( v + w ) ⋅ u = v ⋅ u + w ⋅ u
( c u ) ⋅ v = c ( u ⋅ v ) = u ⋅ ( c v ) (c\mathbf{u})\cdot \mathbf{v}=c(\mathbf{u}\cdot \mathbf{v})=\mathbf{u}\cdot(c\mathbf{v}) ( c u ) ⋅ v = c ( u ⋅ v ) = u ⋅ ( c v )
∥ c u ∥ = ∣ c ∣ ∥ u ∥ \lVert c\mathbf{u} \rVert=\left| c \right|\ \lVert \mathbf{u} \rVert ∥ c u ∥ = ∣ c ∣ ∥ u ∥
Note: (3) shows commutative property of dot product; (4), (5) together show that dot product distributes over addition.
THR. Pythagorean Theorem in R n \mathbb{R}^{n} R n
Let u \mathbf{u} u and v \mathbf{v} v be vectors in R n \mathbb{R}^{n} R n .
Then u \mathbf{u} u and v \mathbf{v} v are orthogonal if and only if
∥ u + v ∥ 2 = ∥ u ∥ 2 + ∥ v ∥ 2 \lVert \mathbf{u} + \mathbf{v} \rVert ^{2} = \lVert \mathbf{u} \rVert ^{2} + \lVert \mathbf{v} \rVert ^{2} ∥ u + v ∥ 2 = ∥ u ∥ 2 + ∥ v ∥ 2 Orthogonal Projection of a Vector on a Line# DEF. orthogonal projection
The orthogonal projection from P ( u ) P(\mathbf{u}) P ( u ) to L ( v ) \mathcal{L}(\mathbf{v}) L ( v ) is given by
w = ( u ⋅ v ∥ v ∥ 2 ) v \mathbf{w} = \left( \frac{\mathbf{u} \cdot \mathbf{v}}{\lVert \mathbf{v} \rVert ^{2}} \right)\mathbf{v} w = ( ∥ v ∥ 2 u ⋅ v ) v DEF. distance from a point to a line
distance from P ( u ) to L ( v ) = ∥ u − w ∥ \text{distance from } P(\mathbf{u}) \text{ to } \mathcal{L} (\mathbf{v}) = \lVert \mathbf{u} - \mathbf{w} \rVert distance from P ( u ) to L ( v ) = ∥ u − w ∥ 6.1.4 Cauchy-Schwarz and Triangle Inequalities# THR. Cauchy-Schwarz Inequality
For any vectors u \mathbf{u} u and v \mathbf{v} v in R n \mathbb{R}^{n} R n ,
∣ u ⋅ v ∣ ≤ ∥ u ∥ ∥ v ∥ . \left| \mathbf{u} \cdot \mathbf{v} \right| \le \lVert \mathbf{u} \rVert \lVert \mathbf{v} \rVert . ∣ u ⋅ v ∣ ≤ ∥ u ∥ ∥ v ∥ .
6.2 Orthogonal Vectors# DEF. orthogonal set
every pair of distinct vectors in the set is orthogonal
DEF. orthonormal set
the set is an orthogonal one consisting entirely of unit vectors
THR.
Any orthogonal set of nonzero vectors is linearly independent.
DEF. orthogonal basis
an orthogonal set that is also a basis for a subspace of R n \mathbb{R}^{n} R n
is called ~ for the subspace.
DEF. orthonormal basis
an orthonormal set that is also a basis
THR. representation of a vector in terms of an orthogonal (or orthonormal) basis
Let { v 1 , v 2 , … , v k } \{ \mathbf{v}_{1}, \mathbf{v}_{2}, \dots, \mathbf{v}_{k} \} { v 1 , v 2 , … , v k } be an orthogonal basis for a subspace V V V of R n \mathbb{R}^{n} R n .
Let u \mathbf{u} u be a vector in V V V .
Then
u = u ⋅ v 1 ∥ v 1 ∥ 2 v 1 + u ⋅ v 2 ∥ v 2 ∥ 2 v 2 + ⋯ + u ⋅ v k ∥ v k ∥ 2 v k \mathbf{u} = \frac{\mathbf{u} \cdot \mathbf{v}_{1}}{\lVert \mathbf{v}_{1} \rVert ^{2}} \mathbf{v}_{1} + \frac{\mathbf{u}\cdot \mathbf{v}_{2}}{\lVert \mathbf{v}_{2} \rVert ^{2}} \mathbf{v}_{2} + \dots + \frac{\mathbf{u}\cdot \mathbf{v}_{k}}{\lVert \mathbf{v}_{k} \rVert ^{2}}\mathbf{v}_{k} u = ∥ v 1 ∥ 2 u ⋅ v 1 v 1 + ∥ v 2 ∥ 2 u ⋅ v 2 v 2 + ⋯ + ∥ v k ∥ 2 u ⋅ v k v k Furthermore, if the orthogonal basis is an orthonormal basis for V V V , then
u = ( u ⋅ v 1 ) v 1 + ( u ⋅ v 2 ) v 2 + ⋯ + ( u ⋅ v k ) v k \mathbf{u} = (\mathbf{u} \cdot \mathbf{v}_{1}) \mathbf{v}_{1} + (\mathbf{u} \cdot \mathbf{v}_{2}) \mathbf{v}_{2} + \dots + (\mathbf{u} \cdot \mathbf{v}_{k}) \mathbf{v}_{k} u = ( u ⋅ v 1 ) v 1 + ( u ⋅ v 2 ) v 2 + ⋯ + ( u ⋅ v k ) v k THR.
Every subspace of R n \mathbb{R}^{n} R n has an orthogonal, and hence an orthogonal, basis.
DEF. The Gram-Schmidt Process
The process converts a basis to the orthogonal basis for the same subspace of R n \mathbb{R}^{n} R n .
input: basis S = { u 1 , u 2 , … , u k } S=\{ \mathbf{u}_{1},\mathbf{u}_{2},\dots,\mathbf{u}_{k} \} S = { u 1 , u 2 , … , u k }
output: orthogonal basis S ′ = { v 1 , v 2 , … , v k } S'=\{ \mathbf{v}_{1} , \mathbf{v}_{2},\dots,\mathbf{v}_{k}\} S ′ = { v 1 , v 2 , … , v k }
v 1 = u 1 v 2 = u 2 − u 2 ⋅ v 1 ∥ v 1 ∥ 2 v 1 v 3 = u 3 − u 3 ⋅ v 1 ∥ v 1 ∥ 2 v 1 − u 3 ⋅ v 2 ∥ v 2 ∥ 2 v 2 ⋮ v k = u k − u k ⋅ v 1 ∥ v 1 ∥ 2 v 1 − u k ⋅ v 2 ∥ v 2 ∥ 2 v 2 − ⋯ − u k ⋅ v k − 1 ∥ v k − 1 ∥ 2 v k − 1 \begin{aligned}
& \mathbf{v}_{1} = \mathbf{u}_{1}\\
& \mathbf{v}_{2} = \mathbf{u}_{2} - \frac{\mathbf{u}_{2} \cdot \mathbf{v}_{1}}{\lVert \mathbf{v}_{1} \rVert ^{2}} \mathbf{v_{1} }\\
& \mathbf{v}_{3} = \mathbf{u}_{3} - \frac{\mathbf{u}_{3} \cdot \mathbf{v}_{1}}{\lVert \mathbf{v}_{1} \rVert ^{2}} \mathbf{v}_{1} - \frac{\mathbf{u}_{3} \cdot \mathbf{v}_{2}}{\lVert \mathbf{v}_{2} \rVert ^{2}} \mathbf{v}_{2}\\
& \vdots\\
&
\mathbf{v}_{k} = \mathbf{u}_{k} - \frac{\mathbf{u}_{k} \cdot \mathbf{v}_{1}}{\lVert \mathbf{v}_{1} \rVert ^{2}}\mathbf{v_{1}} - \frac{ \mathbf{u}_{k} \cdot \mathbf{v}_{2}}{\lVert \mathbf{v}_{2} \rVert ^{2}} \mathbf{v}_{2} - \dots - \frac{\mathbf{u}_{k} \cdot \mathbf{v}_{k - 1}}{\lVert \mathbf{v}_{k - 1} \rVert ^{2}} \mathbf{v}_{k - 1}
\end{aligned} v 1 = u 1 v 2 = u 2 − ∥ v 1 ∥ 2 u 2 ⋅ v 1 v 1 v 3 = u 3 − ∥ v 1 ∥ 2 u 3 ⋅ v 1 v 1 − ∥ v 2 ∥ 2 u 3 ⋅ v 2 v 2 ⋮ v k = u k − ∥ v 1 ∥ 2 u k ⋅ v 1 v 1 − ∥ v 2 ∥ 2 u k ⋅ v 2 v 2 − ⋯ − ∥ v k − 1 ∥ 2 u k ⋅ v k − 1 v k − 1 THR.
Let { u 1 , u 2 , … , u k } \{ \mathbf{u}_{1},\mathbf{u}_{2},\dots,\mathbf{u}_{k} \} { u 1 , u 2 , … , u k } be a basis for a subspace W W W for R n \mathbb{R}^{n} R n .
Then the output of the Gram-Schmidt Process { v 1 , v 2 , … , v k } \{ \mathbf{v}_{1},\mathbf{v}_{2},\dots,\mathbf{v}_{k} \} { v 1 , v 2 , … , v k } is an orthogonal set of nonzero vectors such that
span { v 1 , v 2 , … , v i } = span { u 1 , u 2 , … , u i } \text{span}\{ \mathbf{v}_{1},\mathbf{v}_{2},\dots,\mathbf{v}_{i} \} = \text{span}\{ \mathbf{u}_{1},\mathbf{u}_{2},\dots,\mathbf{u}_{i} \} span { v 1 , v 2 , … , v i } = span { u 1 , u 2 , … , u i } for each 1 ≤ i ≤ k 1\le i\le k 1 ≤ i ≤ k .
So { v 1 , v 2 , … , v k } \{ \mathbf{v}_{1},\mathbf{v}_{2},\dots,\mathbf{v}_{k} \} { v 1 , v 2 , … , v k } is an orthogonal basis for W W W .
DEF. the QR factorization of a matrix
6.3 Orthogonal Projections# DEF. orthogonal complement
The orthogonal complement of a nonempty subset S \mathcal{S} S of R n \mathbb{R}^{n} R n , denoted by S ⊥ \mathcal{S}^{\perp} S ⊥ , is the set of all vectors in R n \mathbb{R}^{n} R n that are orthogonal to every vector in S \mathcal{S} S .
That is,
S ⊥ = { v ∈ R n ∣ v ⋅ u = 0 ∀ u ∈ S } \mathcal{S}^{\perp} = \{ \mathbf{v} \in \mathbb{R}^{n} \ |\ \mathbf{v} \cdot \mathbf{u} = 0 \ \forall \mathbf{u} \in \mathcal{S} \} S ⊥ = { v ∈ R n ∣ v ⋅ u = 0 ∀ u ∈ S } THR. orthogonal complement = subspace
The orthogonal complement of any nonempty subset of R n \mathbb{R}^{n} R n is a subspace of R n \mathbb{R}^{n} R n .
THR. complement = complement of span
For any nonempty subset S \mathcal{S} S of R n \mathbb{R}^{n} R n , we have
S ⊥ = ( span ( S ) ) ⊥ . \mathcal{S} ^{\perp} = (\text{span}(\mathcal{S} ))^{\perp}. S ⊥ = ( span ( S ) ) ⊥ . In particular, the orthogonal complement of a basis for a subspace is the same as the orthogonal complement of the subspace.
THR. complement of row = null space
For any matrix A A A , the orthogonal complement of the row space of A A A is the null space of A A A ; that is,
row ( A ) ⊥ = null ( A ) . \text{row}(A)^{\perp} = \text{null}(A). row ( A ) ⊥ = null ( A ) .
COR. col ( A ) ⊥ = row ( A T ) ⊥ = null ( A T ) \text{col}(A)^{\perp} =\text{row}(A^{T})^{\perp}=\text{null}(A^{T}) col ( A ) ⊥ = row ( A T ) ⊥ = null ( A T )
THR. orthogonal decomposition theorem
Let W W W be a subspace of R n \mathbb{R}^{n} R n .
Then, for any vector u \mathbf{u} u in R n \mathbb{R}^{n} R n , there exist unique vectors w \mathbf{w} w in W W W and z \mathbf{z} z in W ⊥ W^{\perp} W ⊥ such that
u = w + z . \mathbf{u} = \mathbf{w} + \mathbf{z}. u = w + z . In addition, if { v 1 , v 2 , … , v k } \{ \mathbf{v}_{1}, \mathbf{v}_{2},\dots,\mathbf{v}_{k} \} { v 1 , v 2 , … , v k } is an orthonormal basis for W W W , then
w = ( u ⋅ v 1 ) v 1 + ( u ⋅ v 2 ) v 2 + ⋯ + ( u ⋅ v k ) v k . \mathbf{w} = (\mathbf{u}\cdot \mathbf{v}_{1}) \mathbf{v_{1}} + (\mathbf{u}\cdot \mathbf{v}_{2}) \mathbf{v}_{2} + \dots + (\mathbf{u}\cdot \mathbf{v}_{k})\mathbf{v}_{k}. w = ( u ⋅ v 1 ) v 1 + ( u ⋅ v 2 ) v 2 + ⋯ + ( u ⋅ v k ) v k . THR.
For any subspace W W W of R n \mathbb{R}^{n} R n ,
dim ( W ) + dim ( W ⊥ ) = n . \text{dim}(W) + \text{dim}(W^{\perp}) = n. dim ( W ) + dim ( W ⊥ ) = n . DEF. orthogonal projection
Let W W W be a subspace of R n \mathbb{R}^n R n and u \mathbf{u} u be a vector in R n \mathbb{R}^n R n . The orthogonal projection of u \mathbf{u} u on W W W is the unique vector w \mathbf{w} w in W W W such that u − w \mathbf{u} - \mathbf{w} u − w is in W ⊥ W^\perp W ⊥ .
Furthermore, the function U W : R n → R n U_W: \mathbb{R}^n \rightarrow \mathbb{R}^n U W : R n → R n such that U W ( u ) U_W(\mathbf{u}) U W ( u ) is the orthogonal projection of u \mathbf{u} u on W W W for every u \mathbf{u} u in R n \mathbb{R}^n R n is called the orthogonal projection operator on W W W .
THR.
For any subspace W W W of R n \mathbb{R}^{n} R n , the orthogonal projection U W U_{W} U W is linear.
DEF. orthogonal projection matrix
The standard matrix of an orthogonal projection operator U W U_{W} U W on a subspace W W W of R n \mathbb{R}^{n} R n is called the orthogonal projection matri for W W W and is denoted
P W . P_{W}. P W . LEM. linear independence => invertibility
Let C C C be a matrix whose columns cols ( C ) \text{cols}(C) cols ( C ) are linearly independent.
Then
C T C C^{T}C C T C is invertible.
THR. a way to obtain orthogonal projection matrix
Let C C C be an n × k n\times k n × k matrix whose columns form a basis for a subspace W W W of R n \mathbb{R}^{n} R n .
Then
P W = C ( C T C ) − 1 C T . P_{W} = C(C^{T}C) ^{-1} C^{T}. P W = C ( C T C ) − 1 C T . DEF.
The distance from a vector u \mathbf{u} u in R n \mathbb{R}^{n} R n to a subspace W W W of R n \mathbb{R}^{n} R n to be the distance between u \mathbf{u} u and the orthogonal projection of u \mathbf{u} u on W W W .
THR. closest vector property
Let W W W be a subspace of R n \mathbb{R}^{n} R n and u \mathbf{u} u be a vector in R n \mathbb{R}^{n} R n . Among all vectors in W W W , the vector closest to u \mathbf{u} u is the orthogonal projection U W ( u ) U_{W}(\mathbf{u}) U W ( u ) of u \mathbf{u} u on W W W .
6.4 Least-Squares Approximations and Orthogonal Projection Matrices# Least-Squares Approximations# v 1 = [ 1 1 ⋮ 1 ] , v 2 = [ x 1 x 2 ⋮ x n ] , y = [ y 1 y 2 ⋮ y n ] , C = [ v 1 v 2 ] \mathbf{v}_{1} = \begin{bmatrix}
1 \\
1 \\
\vdots \\
1
\end{bmatrix}, \ \mathbf{v}_{2} = \begin{bmatrix}
x_{1} \\
x_{2} \\
\vdots \\
x_{n}
\end{bmatrix}, \ \mathbf{y} = \begin{bmatrix}
y_{1} \\
y_{2} \\
\vdots \\
y_{n}
\end{bmatrix}, \ C = [\mathbf{v}_{1} \ \mathbf{v}_{2}] v 1 = 1 1 ⋮ 1 , v 2 = x 1 x 2 ⋮ x n , y = y 1 y 2 ⋮ y n , C = [ v 1 v 2 ] Error sum of squares
(naturally)
E = [ y 1 − ( a 0 + a 1 x 1 ] 2 + [ y 2 − ( a 0 + a 1 x 2 ) ] 2 + ⋯ + [ y n − ( a 0 + a 1 x n ) ] 2 E = [y_{1} - (a_{0} + a_{1} x_{1} ]^{2} + [y_{2} - (a_{0} + a_{1} x_{2})]^{2} + \dots + [y_{n} - (a_{0} + a_{1} x_{n})]^{2} E = [ y 1 − ( a 0 + a 1 x 1 ] 2 + [ y 2 − ( a 0 + a 1 x 2 ) ] 2 + ⋯ + [ y n − ( a 0 + a 1 x n ) ] 2 (in terms of vectors)
E = ∥ y − ( a 0 v 1 + a 1 v 2 ) ∥ 2 E = \lVert \mathbf{y} - (a_{0} \mathbf{v}_{1} + a_{1} \mathbf{v}_{2}) \rVert ^{2} E = ∥ y − ( a 0 v 1 + a 1 v 2 ) ∥ 2 Then note that
E = dist ( y , a 0 v 1 + a 1 v 2 ) \sqrt{ E } = \text{dist}(\mathbf{y}, a_{0} \mathbf{v}_{1} + a_{1} \mathbf{v}_{2}) E = dist ( y , a 0 v 1 + a 1 v 2 ) So we need to make
w = a 0 v 1 + a 1 v 2 \mathbf{w} = a_{0} \mathbf{v}_{1} + a_{1} \mathbf{v}_{2} w = a 0 v 1 + a 1 v 2 as close to y \mathbf{y} y as possible.
And note that
w ∈ W = span { v 1 , v 2 } \mathbf{w} \in W = \text{span}\{ \mathbf{v}_{1}, \mathbf{v}_{2} \} w ∈ W = span { v 1 , v 2 } Then the closest w \mathbf{w} w to y \mathbf{y} y is the orthogonal projection of y \mathbf{y} y onto W W W .
Note that the columns of C C C form a basis for W W W .
Then
P W y = C ( C T C − 1 ) C T y = w = a 0 v 1 + a 1 v 2 = C [ a 0 a 1 ] \begin{aligned}
P_{W} \mathbf{y} & = C(C^{T} C^{-1}) C^{T} \mathbf{y} \\
& = \mathbf{w} = a_{0} \mathbf{v}_{1} + a_{1} \mathbf{v}_{2} = C\begin{bmatrix}
a_{0} \\
a_{1}
\end{bmatrix}
\end{aligned} P W y = C ( C T C − 1 ) C T y = w = a 0 v 1 + a 1 v 2 = C [ a 0 a 1 ] C [ a 0 a 1 ] = C ( C T C ) − 1 C T y C\begin{bmatrix}
a_{0} \\
a_{1}
\end{bmatrix} = C(C^{T} C)^{-1} C^{T} \mathbf{y} C [ a 0 a 1 ] = C ( C T C ) − 1 C T y and then
C T C [ a 0 a 1 ] = C T C ( C T C ) − 1 C T y = C T y C^{T}C \begin{bmatrix}
a_{0} \\
a_{1}
\end{bmatrix} = C^{T}C(C^{T}C)^{-1} C^{T}\mathbf{y} = C^{T} \mathbf{y} C T C [ a 0 a 1 ] = C T C ( C T C ) − 1 C T y = C T y (normal equations)
And since columns of C C C are linearly independent, C T C C^{T}C C T C is invertible. (LEM. linear independence => invertibility)
Then
[ a 0 a 1 ] = ( C T C ) C T y . \begin{bmatrix}
a_{0} \\
a_{1}
\end{bmatrix} = (C^{T} C ) C^{T} \mathbf{y}. [ a 0 a 1 ] = ( C T C ) C T y . Inconsistent Systems of Linear Equations# A possibly inconsistent equation
A x = b A\mathbf{x} = \mathbf{b} A x = b We try to obtain z \mathbf{z} z where
∥ A z − b ∥ \lVert A\mathbf{z} - \mathbf{b} \rVert ∥ A z − b ∥ is a minimum.
Define W W W :
W : = { A u } W : = \{ A\mathbf{u} \} W := { A u } Then W W W is the column space of A A A .
The vector in W W W that is closest to b \mathbf{b} b is the orthogonal projection of b \mathbf{b} b onto W W W . (THR. closest vector property)
Then the vector z \mathbf{z} z minimizes ∥ A z − b ∥ \lVert A\mathbf{z} - \mathbf{b} \rVert ∥ A z − b ∥ if and only if z \mathbf{z} z is a solution of
A x = P W b . A\mathbf{x} = P_{W} \mathbf{b} . A x = P W b .