Your browser does not fully support modern features. Please upgrade for a smoother experience.

Version	Summary	Created by	Modification	Content Size	Created at	Operation
1	handwiki	Camila Xu	--	3655	2022-12-01 01:33:20

Video Upload Options

We provide professional Academic Video Service to translate complex research into visually appealing presentations. Would you like to try it?

No, upload directly Yes

Cite

If you have any further questions, please contact Encyclopedia Editorial Office.

Select a Style

HandWiki. Proofs Involving the Moore–Penrose Inverse. Encyclopedia. Available online: https://encyclopedia.pub/entry/37456 (accessed on 07 February 2026).

HandWiki. Proofs Involving the Moore–Penrose Inverse. Encyclopedia. Available at: https://encyclopedia.pub/entry/37456. Accessed February 07, 2026.

HandWiki. "Proofs Involving the Moore–Penrose Inverse" Encyclopedia, https://encyclopedia.pub/entry/37456 (accessed February 07, 2026).

HandWiki. (2022, December 01). Proofs Involving the Moore–Penrose Inverse. In Encyclopedia. https://encyclopedia.pub/entry/37456

HandWiki. "Proofs Involving the Moore–Penrose Inverse." Encyclopedia. Web. 01 December, 2022.

Proofs Involving the Moore–Penrose Inverse

Edit

The content is sourced from: https://handwiki.org/wiki/Proofs_involving_the_Moore%E2%80%93Penrose_inverse

In linear algebra, the Moore–Penrose inverse is a matrix that satisfies some but not necessarily all of the properties of an inverse matrix. This article collects together a variety of proofs involving the Moore–Penrose inverse.

moore–penrose

1. Definition

Let [math]\displaystyle{ A }[/math] be an m-by-n matrix over a field [math]\displaystyle{ \mathbb{K} }[/math] and [math]\displaystyle{ A^+ }[/math] be an n-by-m matrix over [math]\displaystyle{ \mathbb{K} }[/math], where [math]\displaystyle{ \mathbb{K} }[/math] is either [math]\displaystyle{ \mathbb{R} }[/math], the real numbers, or [math]\displaystyle{ \mathbb{C} }[/math], the complex numbers. The following four criteria are called the Moore–Penrose conditions:

[math]\displaystyle{ AA^+A = A }[/math],
[math]\displaystyle{ A^+AA^+ = A^+ }[/math],
[math]\displaystyle{ \left(AA^+\right)^* = AA^+ }[/math],
[math]\displaystyle{ \left(A^+A\right)^* = A^+A }[/math].

Given a matrix [math]\displaystyle{ A }[/math], there exists a unique matrix [math]\displaystyle{ A^+ }[/math] that satisfies all four of the Moore–Penrose conditions, which is called the Moore–Penrose inverse of [math]\displaystyle{ A }[/math].^[1]^[2]^[3]^[4] Notice that [math]\displaystyle{ A }[/math] is also the Moore–Penrose inverse of [math]\displaystyle{ A^+ }[/math]. That is, [math]\displaystyle{ \left(A^+\right)^+ = A }[/math].

2. Useful Lemmas

These results are used in the proofs below. In the following lemmas, A is a matrix with complex elements and n columns, B is a matrix with complex elements and n rows.

2.1. Lemma 1: A^*A = 0 ⇒ A = 0

The assumption says that all elements of A*A are zero. Therefore,

[math]\displaystyle{ 0 = \operatorname{Tr}\left(A^*A\right) = \sum_{j=1}^n \left(A^*A\right)_{jj} = \sum_{j=1}^n \sum_{i=1}^m \left(A^*\right)_{ji}A_{ij} = \sum_{i=1}^m \sum_{j=1}^n \left|A_{ij}\right|^2 . }[/math]

Therefore, all [math]\displaystyle{ A_{ij} }[/math] equal 0 i.e. [math]\displaystyle{ A = 0 }[/math].

2.2. Lemma 2: A^*AB = 0 ⇒ AB = 0

[math]\displaystyle{ \begin{align} 0 &= A^*AB & \\ \Rightarrow 0 &= B^*A^*AB & \\ \Rightarrow 0 &= (AB)^*(AB) & \\ \Rightarrow 0 &= AB &(\text{by Lemma 1}) \end{align} }[/math]

2.3. Lemma 3: ABB^* = 0 ⇒ AB = 0

This is proved in a manner similar to the argument of Lemma 2 (or by simply taking the Hermitian conjugate).

3. Existence and Uniqueness

3.1. Proof of Uniqueness

Let [math]\displaystyle{ A }[/math] be a matrix over [math]\displaystyle{ \mathbb{R} }[/math] or [math]\displaystyle{ \mathbb{C} }[/math]. Suppose that [math]\displaystyle{ {A^+_1} }[/math] and [math]\displaystyle{ {A^+_2} }[/math] are Moore–Penrose inverses of [math]\displaystyle{ A }[/math]. Observe then that

[math]\displaystyle{ A{A^+_1} (A{A^+_2}A){A^+_1} = (A{A^+_2})(A{A^+_1}) (A{A^+_2})^*(A{A^+_1})^* = {A^+_2}^*(A{A^+_1}A)^* {A^+_2}^*A^* = (A{A^+_2})^* A{A^+_2}. }[/math]

Analogously we conclude that [math]\displaystyle{ {A^+_1}A = {A^+_2}A }[/math]. The proof is completed by observing that then

[math]\displaystyle{ {A^+_1} {A^+_1}A{A^+_1} = {A^+_1}A{A^+_2} = A^+_2A{A^+_2} {A^+_2}. }[/math]

3.2. Proof of Existence

The proof proceeds in stages.

1-by-1 matrices

For any [math]\displaystyle{ x \in \mathbb{K} }[/math], we define:

[math]\displaystyle{ x^+ := \begin{cases} x^{-1}, & \mbox{if }x \neq 0 \\ 0, & \mbox{if }x = 0 \end{cases} }[/math]

It is easy to see that [math]\displaystyle{ x^+ }[/math] is a pseudoinverse of [math]\displaystyle{ x }[/math] (interpreted as a 1-by-1 matrix).

Square diagonal matrices

Let [math]\displaystyle{ D }[/math] be an n-by-n matrix over [math]\displaystyle{ \mathbb{K} }[/math] with zeros off the diagonal. We define [math]\displaystyle{ D^+ }[/math] as an n-by-n matrix over [math]\displaystyle{ \mathbb{K} }[/math] with [math]\displaystyle{ \left(D^+\right)_{ij} := \left(D_{ij}\right)^+ }[/math] as defined above. We write simply [math]\displaystyle{ D^+_{ij} }[/math] for [math]\displaystyle{ \left(D^+\right)_{ij} = \left(D_{ij}\right)^+ }[/math].

Notice that [math]\displaystyle{ D^+ }[/math] is also a matrix with zeros off the diagonal.

We now show that [math]\displaystyle{ D^+ }[/math] is a pseudoinverse of [math]\displaystyle{ D }[/math]:

[math]\displaystyle{ \left(DD^+D\right)_{ij} = D_{ij}D^+_{ij}D_{ij} = D_{ij} \Rightarrow DD^+D = D }[/math]
[math]\displaystyle{ \left(D^+DD^+\right)_{ij} = D^+_{ij}D_{ij}D^+_{ij} = D^+_{ij} \Rightarrow D^+DD^+ = D^+ }[/math]
[math]\displaystyle{ \left(DD^+\right)^*_{ij} = \overline{\left(DD^+\right)_{ji}} = \overline{D_{ji}D^+_{ji}} = \left(D_{ji}D^+_{ji}\right)^* = D_{ji}D^+_{ji} = D_{ij}D^+_{ij} \Rightarrow \left(DD^+\right)^* = DD^+ }[/math]
[math]\displaystyle{ \left(D^+D\right)^*_{ij} = \overline{\left(D^+D\right)_{ji}} = \overline{D^+_{ji}D_{ji}} = \left(D^+_{ji}D_{ji}\right)^* = D^+_{ji}D_{ji} = D^+_{ij}D_{ij} \Rightarrow \left(D^+D\right)^* = D^+D }[/math]

General non-square diagonal matrices

Let [math]\displaystyle{ D }[/math] be an m-by-n matrix over [math]\displaystyle{ \mathbb{K} }[/math] with zeros off the main diagonal, where m and n are unequal. That is, [math]\displaystyle{ D_{ij} = d_{i} }[/math] for some [math]\displaystyle{ d_{i}\in\mathbb{K} }[/math] when [math]\displaystyle{ i = j }[/math] and [math]\displaystyle{ D_{ij} = 0 }[/math] otherwise.

Consider the case where [math]\displaystyle{ n\gt m }[/math]. Then we can rewrite [math]\displaystyle{ D = \left[D_0\,\,\mathbf{0}_{m\times (n-m)}\right] }[/math] by stacking where [math]\displaystyle{ D_0 }[/math] is a square diagonal m-by-m matrix, and [math]\displaystyle{ \mathbf{0}_{m\times(n-m)} }[/math] is the m-by-(n−m) zero matrix. We define [math]\displaystyle{ D^+ \equiv \begin{bmatrix} D_0^+ \\ \mathbf{0}_{(n-m)\times m} \end{bmatrix} }[/math] as an n-by-m matrix over [math]\displaystyle{ \mathbb{K} }[/math], with [math]\displaystyle{ D_0^+ }[/math] the pseudoinverse of [math]\displaystyle{ D_0 }[/math] defined above, and [math]\displaystyle{ \mathbf{0}_{(n-m)\times m} }[/math] the (n−m)-by-m zero matrix. We now show that [math]\displaystyle{ D^+ }[/math] is a pseudoinverse of [math]\displaystyle{ D }[/math]:

By multiplication of block matrices, [math]\displaystyle{ DD^+ = D_0D_0^+ + \mathbf{0}_{m\times(n-m)}\mathbf{0}_{(n-m)\times m} = D_0D_0^+, }[/math] so by property 1 for square diagonal matrices [math]\displaystyle{ D_0 }[/math] proven in the previous section,[math]\displaystyle{ DD^+D = D_0D_0^+\left[D_0\,\,\mathbf{0}_{m\times (n-m)}\right] = \left[D_0D_0^+D_0\,\,\mathbf{0}_{m\times (n-m)}\right] = \left[D_0\,\,\mathbf{0}_{m\times (n-m)}\right] = D }[/math].
Similarly, [math]\displaystyle{ D^+D = \begin{bmatrix} D_0^+D_0 & \mathbf{0}_{m\times(n-m)} \\ \mathbf{0}_{(n-m)\times m} & \mathbf{0}_{(n-m)\times(n-m)} \end{bmatrix} }[/math], so [math]\displaystyle{ D^+DD^+ = \begin{bmatrix} D_0^+D_0 & \mathbf{0}_{m\times(n-m)} \\ \mathbf{0}_{(n-m)\times m} & \mathbf{0}_{(n-m)\times(n-m)} \end{bmatrix}\begin{bmatrix} D_0^+ \\ \mathbf{0}_{(n-m)\times m} \end{bmatrix} = \begin{bmatrix} D_0^+D_0D_0^+ \\ \mathbf{0}_{(n-m)\times m} \end{bmatrix} = D^+. }[/math]
By 1 and property 3 for square diagonal matrices, [math]\displaystyle{ \left(DD^+\right)^* = \left(D_0D_0^+\right)^* = D_0D_0^+ = DD^+ }[/math].
By 2 and property 4 for square diagonal matrices, [math]\displaystyle{ \left(D^+D\right)^* = \begin{bmatrix} \left(D_0^+D_0\right)^* & \mathbf{0}_{m\times(n-m)} \\ \mathbf{0}_{(n-m)\times m} & \mathbf{0}_{(n-m)\times(n-m)} \end{bmatrix} = \begin{bmatrix} D_0^+D_0 & \mathbf{0}_{m\times(n-m)} \\ \mathbf{0}_{(n-m)\times m} & \mathbf{0}_{(n-m)\times(n-m)} \end{bmatrix} = D^+D. }[/math]

Existence for [math]\displaystyle{ D }[/math] such that [math]\displaystyle{ m \gt n }[/math] follows by swapping the roles of [math]\displaystyle{ D }[/math] and [math]\displaystyle{ D^+ }[/math] in the [math]\displaystyle{ n \gt m }[/math] case and using the fact that [math]\displaystyle{ \left(D^+\right)^+ = D }[/math].

Arbitrary matrices

The singular value decomposition theorem states that there exists a factorization of the form

[math]\displaystyle{ A = U\Sigma V^* }[/math]

where:

[math]\displaystyle{ U }[/math] is an m-by-m unitary matrix over [math]\displaystyle{ \mathbb{K} }[/math].

[math]\displaystyle{ \Sigma }[/math] is an m-by-n matrix over [math]\displaystyle{ \mathbb{K} }[/math] with nonnegative real numbers on the diagonal and zeros off the diagonal.

[math]\displaystyle{ V }[/math] is an n-by-n unitary matrix over [math]\displaystyle{ \mathbb{K} }[/math].^[5]

Define [math]\displaystyle{ A^+ }[/math] as [math]\displaystyle{ V\Sigma^+U^* }[/math].

We now show that [math]\displaystyle{ A^+ }[/math] is a pseudoinverse of [math]\displaystyle{ A }[/math]:

[math]\displaystyle{ AA^+A = U\Sigma V^*V\Sigma^+U^*U\Sigma V^* = U\Sigma\Sigma^+\Sigma V^* = U\Sigma V^* = A }[/math]
[math]\displaystyle{ A^+AA^+ = V\Sigma^+U^*U\Sigma V^*V\Sigma^+U^* = V\Sigma^+\Sigma\Sigma^+U^* = V\Sigma^+U^* = A^+ }[/math]
[math]\displaystyle{ \left(AA^+\right)^* = \left(U\Sigma V^*V\Sigma^+U^*\right)^* = \left(U\Sigma\Sigma^+U^*\right)^* = U\left(\Sigma\Sigma^+\right)^*U^* = U\left(\Sigma\Sigma^+\right)U^* = U\Sigma V^*V\Sigma^+U^* = AA^+ }[/math]
[math]\displaystyle{ \left(A^+A\right)^* = \left(V\Sigma^+U^*U\Sigma V^*\right)^* = \left(V\Sigma^+\Sigma V^*\right)^* = V\left(\Sigma^+\Sigma\right)^*V^* = V\left(\Sigma^+\Sigma\right)V^* = V\Sigma^+U^*U\Sigma V^* = A^+A }[/math]

4. Basic Properties

[math]\displaystyle{ {A^*}^+ = {A^+}^* }[/math]

The proof works by showing that [math]\displaystyle{ {A^+}^* }[/math] satisfies the four criteria for the pseudoinverse of [math]\displaystyle{ A^* }[/math]. Since this amounts to just substitution, it is not shown here.

The proof of this relation is given as Exercise 1.18c in.^[6]

4.1. Identities

A⁺ = A⁺ A^+* A^*

[math]\displaystyle{ A^+ = A^+AA^+ }[/math] and [math]\displaystyle{ AA^+ = \left(AA^+\right)^* }[/math] imply that [math]\displaystyle{ A^+ = A^+\left(A A^+\right)^* = A^+A^{+^*}A^* }[/math].

A⁺ = A^* A^+* A⁺

[math]\displaystyle{ A^+ = A^+AA^+ }[/math] and [math]\displaystyle{ A^+A = \left(A^+A\right)^* }[/math] imply that [math]\displaystyle{ A^+ = \left(A^+A\right)^*A^+ = A^*A^{+*}A^+ }[/math].

A = A^+* A^* A

[math]\displaystyle{ A = A A^+ A }[/math] and [math]\displaystyle{ A A^+ = \left(A A^+\right)^* }[/math] imply that [math]\displaystyle{ A = \left(A A^+\right)^* A = A^{+^*} A^* A }[/math].

A = A A^* A^+*

[math]\displaystyle{ A = A A^+ A }[/math] and [math]\displaystyle{ A^+A = \left(A^+A\right)^* }[/math] imply that [math]\displaystyle{ A = A \left(A^+ A\right)^* = A A^* A^{+^*} }[/math].

A^* = A^* A A⁺

This is the conjugate transpose of [math]\displaystyle{ A = A^{+^*} A^* A }[/math] above.

A^* = A⁺ A A^*

This is the conjugate transpose of [math]\displaystyle{ A = A A^* A^{+^*} }[/math] above.

5. Reduction to the Hermitian Case

The results of this section show that the computation of the pseudoinverse is reducible to its construction in the Hermitian case. It suffices to show that the putative constructions satisfy the defining criteria.

5.1. A⁺ = A^* (A A^*)⁺

This relation is given as exercise 18(d) in,^[6] for the reader to prove, "for every matrix $A$ ". Write [math]\displaystyle{ D = A^*\left(AA^*\right)^+ }[/math]. Observe that

[math]\displaystyle{ \begin{align} & & AA^* &= AA^*\left(AA^*\right)^+ AA^* & \\ &\Leftrightarrow & AA^* &= ADAA^* & \\ &\Leftrightarrow & 0 &= (AD - I)AA^* & \\ &\Leftrightarrow & 0 &= ADA - A & (\text{by Lemma 3}) \\ &\Leftrightarrow & A &= ADA & \end{align} }[/math]

Similarly, [math]\displaystyle{ \left(AA^*\right)^+AA^*\left(AA^*\right)^+ = \left(AA^*\right)^+ }[/math] implies that [math]\displaystyle{ A^*\left(AA^*\right)^+AA^*\left(AA^*\right)^+ = A^*\left(AA^*\right)^+ }[/math] i.e. [math]\displaystyle{ DAD = D }[/math].

Additionally, [math]\displaystyle{ AD = AA^*\left(AA^*\right)^+ }[/math] so [math]\displaystyle{ AD = (AD)^* }[/math].

Finally, [math]\displaystyle{ DA = A^*\left(AA^*\right)^+A }[/math] implies that [math]\displaystyle{ (DA)^* = A^* \left(\left(AA^*\right)^+\right)^*A = A^* \left(\left(AA^*\right)^+\right)A = DA }[/math].

Therefore, [math]\displaystyle{ D = A^+ }[/math].

5.2. A⁺ = (A^* A)⁺A^*

This is proved in an analogous manner to the case above, using Lemma 2 instead of Lemma 3.

6. Products

For the first three proofs, we consider products C = AB.

6.1. A Has Orthonormal Columns

If [math]\displaystyle{ A }[/math] has orthonormal columns i.e. [math]\displaystyle{ A^*A = I }[/math] then [math]\displaystyle{ A^+ = A^* }[/math]. Write [math]\displaystyle{ D = B^+A^+ = B^+A^* }[/math]. We show that [math]\displaystyle{ D }[/math] satisfies the Moore–Penrose criteria.

[math]\displaystyle{ \begin{align} CDC &= ABB^+A^*AB = ABB^+B = AB = C, \\[4pt] DCD &= B^+A^*ABB^+A^* = B^+BB^+A^* = B^+A^* = D, \\[4pt] (CD)^* &= D^*B^*A^* = A\left(B^+\right)^*B^*A^* = A\left(BB^+\right)^*A^* = ABB^+A^* = CD, \\[4pt] (DC)^* &= B^*A^*D^* = B^*A^*A\left(B^+\right)^* = \left(B^+B\right)^* = B^+B = B^+A^*AB = DC . \end{align} }[/math]

Therefore, [math]\displaystyle{ D = C^+ }[/math].

6.2. B Has Orthonormal Rows

If B has orthonormal rows i.e. [math]\displaystyle{ BB^* = I }[/math] then [math]\displaystyle{ B^+ = B^* }[/math]. Write [math]\displaystyle{ D = B^+A^+ = B^*A^+ }[/math]. We show that [math]\displaystyle{ D }[/math] satisfies the Moore–Penrose criteria.

[math]\displaystyle{ \begin{align} CDC &= ABB^*A^+AB = AA^+AB = AB = C, \\[4pt] DCD &= B^*A^+ABB^*A^+ = B^*A^+AA^+ = B^*A^+ = D, \\[4pt] (CD)^* &= D^*B^*A^* = \left(A^+\right)^*BB^*A^* = \left(A^+\right)^*A^* = \left(AA^+\right)^* = AA^+ = ABB^*A^+ = CD, \\[4pt] (DC)^* &= B^*A^*D^* = B^*A^*\left(A^+\right)^*B = B^*\left(A^+A\right)^*B = B^*A^+AB = DC . \end{align} }[/math]

Therefore, [math]\displaystyle{ D = C^+. }[/math]

6.3. A Has Full Column Rank and B Has Full Row Rank

Since [math]\displaystyle{ A }[/math] has full column rank, [math]\displaystyle{ A^*A }[/math] is invertible so [math]\displaystyle{ \left(A^*A\right)^+ = \left(A^*A\right)^{-1} }[/math]. Similarly, since [math]\displaystyle{ B }[/math] has full row rank, [math]\displaystyle{ BB^* }[/math] is invertible so [math]\displaystyle{ \left(BB^*\right)^+ = \left(BB^*\right)^{-1} }[/math].

Write [math]\displaystyle{ D = B^+A^+ = B^*\left(BB^*\right)^{-1}\left(A^*A\right)^{-1}A^* }[/math](using reduction to the Hermitian case). We show that [math]\displaystyle{ D }[/math] satisfies the Moore–Penrose criteria.

[math]\displaystyle{ \begin{align} CDC &= ABB^*\left(BB^*\right)^{-1}\left(A^*A\right)^{-1}A^*AB = AB = C, \\[4pt] DCD &= B^*\left(BB^*\right)^{-1}\left(A^*A\right)^{-1}A^*ABB^*\left(BB^*\right)^{-1}\left(A^*A\right)^{-1}A^* = B^*\left(BB^*\right)^{-1}\left(A^*A\right)^{-1}A^* = D, \\[4pt] CD &= ABB^*\left(BB^*\right)^{-1}\left(A^*A\right)^{-1}A^* = A\left(A^*A\right)^{-1}A^* = \left(A\left(A^*A\right)^{-1}A^*\right)^*, \\ \Rightarrow (CD)^* &= CD, \\[4pt] DC &= B^*\left(BB^*\right)^{-1}\left(A^*A\right)^{-1}A^*AB = B^*\left(BB^*\right)^{-1}B = \left(B^*\left(BB^*\right)^{-1}B\right)^*, \\ \Rightarrow (DC)^* &= DC. \end{align} }[/math]

Therefore, [math]\displaystyle{ D = C^+ }[/math].

6.4. Conjugate Transpose

Here, [math]\displaystyle{ B = A^* }[/math], and thus [math]\displaystyle{ C = AA^* }[/math] and [math]\displaystyle{ D = A^{+*} A^+ }[/math]. We show that indeed [math]\displaystyle{ D }[/math] satisfies the four Moore–Penrose criteria.

[math]\displaystyle{ \begin{align} CDC &= A A^* A^{+*} A^+ A A^* = A \left(A^+A\right)^*A^+A A^* = A A^+A A^+A A^* = A A^+A A^* = A A^* = C \\[4pt] DCD &= A^{+*} A^+ A A^* A^{+*} A^+ = A^{+*} A^+ A \left(A^+A\right)^*A^+ = A^{+*} A^+ A A^+A A^+ = A^{+*} A^+ A A^+ = A^{+*} A^+ = D \\[4pt] (CD)^* &= \left(A A^* A^{+*} A^+\right)^* = A^{+*} A^+ A A^* = A^{+*} \left(A^+ A\right)^* A^* = A^{+*} A^* A^{+*} A^* \\ &= \left(A A^+\right)^* \left(A A^+\right)^* = A A^+ A A^+ = A\left(A^+A\right)^*A^+ = A A^*A^{+*} A^+ = CD \\[4pt] (DC)^* &= \left(A^{+*} A^+ A A^*\right)^* = A A^* A^{+*} A^+ = A \left(A^+ A\right)^* A^+ = A A^+ A A^+ \\ &= \left(A A^+\right)^* \left(A A^+\right)^* = A^{+*} A^* A^{+*} A^* = A^{+*} \left(A^+ A\right)^* A^* = A^{+*} A^+ A A^* = DC \end{align} }[/math]

Therefore, [math]\displaystyle{ D = C^+ }[/math]. In other words:

[math]\displaystyle{ \left(A A^*\right)^+ = A^{+*} A^+ }[/math]

and, since [math]\displaystyle{ \left(A^*\right)^* = A }[/math]

[math]\displaystyle{ \left(A^*A\right)^+ = A^+A^{+*} }[/math]

7. Projectors and Subspaces

Define [math]\displaystyle{ P = AA^+ }[/math] and [math]\displaystyle{ Q = A^+ A }[/math]. Observe that [math]\displaystyle{ P^2 = AA^+ AA^+ = AA^+ = P }[/math]. Similarly [math]\displaystyle{ Q^2 = Q }[/math], and finally, [math]\displaystyle{ P = P^* }[/math] and [math]\displaystyle{ Q = Q^* }[/math]. Thus [math]\displaystyle{ P }[/math] and [math]\displaystyle{ Q }[/math] are orthogonal projection operators. Orthogonality follows from the relations [math]\displaystyle{ P = P^* }[/math] and [math]\displaystyle{ Q = Q^* }[/math]. Indeed, consider the operator [math]\displaystyle{ P }[/math]: any vector decomposes as

[math]\displaystyle{ x = Px + (I - P)x }[/math]

and for all vectors [math]\displaystyle{ x }[/math] and [math]\displaystyle{ y }[/math] satisfying [math]\displaystyle{ Px = x }[/math] and [math]\displaystyle{ (I - P)y = y }[/math], we have

[math]\displaystyle{ x^*y = (Px)^*(I - P)y = x^* P^*(I - P)y = x^*P(I - P)y = 0 }[/math].

It follows that [math]\displaystyle{ PA = AA^+ A = A }[/math] and [math]\displaystyle{ A^+ P = A^+ AA^+ = A^+ }[/math]. Similarly, [math]\displaystyle{ QA^+ = A^+ }[/math] and [math]\displaystyle{ AQ = A }[/math]. The orthogonal components are now readily identified.

If [math]\displaystyle{ y }[/math] belongs to the range of [math]\displaystyle{ A }[/math] then for some [math]\displaystyle{ x }[/math], [math]\displaystyle{ y = Ax }[/math] and [math]\displaystyle{ Py = PAx = Ax = y }[/math]. Conversely, if [math]\displaystyle{ Py = y }[/math] then [math]\displaystyle{ y = AA^+ y }[/math] so that [math]\displaystyle{ y }[/math] belongs to the range of [math]\displaystyle{ A }[/math]. It follows that [math]\displaystyle{ P }[/math] is the orthogonal projector onto the range of [math]\displaystyle{ A }[/math]. [math]\displaystyle{ I - P }[/math] is then the orthogonal projector onto the orthogonal complement of the range of [math]\displaystyle{ A }[/math], which equals the kernel of [math]\displaystyle{ A^* }[/math].

A similar argument using the relation [math]\displaystyle{ QA^* = A^* }[/math] establishes that [math]\displaystyle{ Q }[/math] is the orthogonal projector onto the range of [math]\displaystyle{ A^* }[/math] and [math]\displaystyle{ (I - Q) }[/math] is the orthogonal projector onto the kernel of [math]\displaystyle{ A }[/math].

Using the relations [math]\displaystyle{ P\left(A^+\right)^* = P^*\left(A^+\right)^* = \left(A^+ P\right)^* = \left(A^+\right)^* }[/math] and [math]\displaystyle{ P = P^* = \left(A^+\right)^* A^* }[/math] it follows that the range of P equals the range of [math]\displaystyle{ \left(A^+\right)^* }[/math], which in turn implies that the range of [math]\displaystyle{ I - P }[/math] equals the kernel of [math]\displaystyle{ A^+ }[/math]. Similarly [math]\displaystyle{ QA^+ = A^+ }[/math] implies that the range of [math]\displaystyle{ Q }[/math] equals the range of [math]\displaystyle{ A^+ }[/math]. Therefore, we find,

[math]\displaystyle{ \begin{align} \operatorname{Ker}\left(A^+\right) &= \operatorname{Ker}\left(A^*\right). \\ \operatorname{Im}\left(A^+\right) &= \operatorname{Im}\left(A^*\right). \\ \end{align} }[/math]

8. Additional Properties

8.1. Least-Squares Minimization

In the general case, it is shown here for any [math]\displaystyle{ m \times n }[/math] matrix [math]\displaystyle{ A }[/math] that [math]\displaystyle{ \|Ax -b\|_2 \ge \|Az -b\|_2 }[/math] where [math]\displaystyle{ z = A^+ b }[/math]. This lower bound need not be zero as the system [math]\displaystyle{ Ax=b }[/math] may not have a solution (e.g. when the matrix A does not have full rank or the system is overdetermined).

To prove this, we first note that (stating the complex case), using the fact that [math]\displaystyle{ P = A A^+ }[/math] satisfies [math]\displaystyle{ PA = A }[/math] and [math]\displaystyle{ P = P^* }[/math], we have

[math]\displaystyle{ \begin{alignat}{2} A^*(Az - b) & = A^*(A A^+ b - b)\\ & = A^*(P b - b) \\ & = A^*P^* b - A^*b \\ & = (PA)^* b - A^*b \\ & = 0 \end{alignat} }[/math]

so that ([math]\displaystyle{ \text{c.c.} }[/math] stands for the complex conjugate of the previous term in the following)

[math]\displaystyle{ \begin{alignat}{2} \|Ax -b\|_2^2 &= \|Az -b\|_2^2 + (A(x-z))^*(Az-b) + \text{c.c.} + \|A(x - z)\|_2^2 \\ &= \|Az -b\|_2^2 + (x-z)^*A^*(Az-b) + \text{c.c.} + \|A(x - z)\|_2^2 \\ &= \|Az -b\|_2^2 + \|A(x - z)\|_2^2\\ & \ge \|Az -b\|_2^2 \end{alignat} }[/math]

as claimed.

If [math]\displaystyle{ A }[/math] is injective i.e. one-to-one (which implies [math]\displaystyle{ m \ge n }[/math]), then the bound is attained uniquely at [math]\displaystyle{ z }[/math].

8.2. Minimum-Norm Solution to a Linear System

The proof above also shows that if the system [math]\displaystyle{ Ax = b }[/math] is satisfiable i.e. has a solution, then necessarily [math]\displaystyle{ z = A^+b }[/math] is a solution (not necessarily unique). We show here that [math]\displaystyle{ z }[/math] is the smallest such solution (its Euclidean norm is uniquely minimum).

To see this, note first, with [math]\displaystyle{ Q = A^+ A }[/math], that [math]\displaystyle{ Qz = A^+ AA^+ b = A^+ b = z }[/math] and that [math]\displaystyle{ Q^* = Q }[/math]. Therefore, assuming that [math]\displaystyle{ Ax = b }[/math], we have

[math]\displaystyle{ \begin{align} z^*(x - z) &= (Qz)^*(x - z) \\ &= z^*Q(x - z) \\ &= z^*\left(A^+ Ax - z\right) \\ &= z^*\left(A^+ b - z\right) \\ &= 0. \end{align} }[/math]

Thus

[math]\displaystyle{ \begin{alignat}{2} \|x\|_2^2 &= \|z\|_2^2 + z^*(x - z) + \text{c.c.} + \|x - z\|_2^2 \\ &= \|z\|_2^2 + \|x - z\|_2^2 \\ &\ge \|z\|_2^2 \end{alignat} }[/math]

with equality if and only if [math]\displaystyle{ x = z }[/math], as was to be shown.

An immediate consequence of this result is that [math]\displaystyle{ z }[/math] is also the uniquely smallest solution to the least-squares minimization problem for all [math]\displaystyle{ Ax = b }[/math], including when [math]\displaystyle{ A }[/math] is neither injective nor surjective. It can be shown that the least-squares approximation [math]\displaystyle{ Az = y \approx b }[/math] is unique. Thus it is necessary and sufficient for all [math]\displaystyle{ x }[/math] that solve the least-squares minimization to satisfy [math]\displaystyle{ Ax = y = Az = AA^+b }[/math]. This system always has a solution (not necessarily unique) as [math]\displaystyle{ Az }[/math] lies in the column space of [math]\displaystyle{ A }[/math]. From the above result the smallest [math]\displaystyle{ x }[/math] which solves this system is [math]\displaystyle{ A^+(AA^+b) = A^+b = z }[/math].

References

(Ben-Israel Greville)
(Campbell Meyer)
(Nakamura 1991)
(Rao Mitra)
Some authors use slightly different dimensions for the factors. The two definitions are equivalent.
Adi Ben-Israel; Thomas N.E. Greville (2003). Generalized Inverses. Springer-Verlag. ISBN 978-0-387-00293-4.

©Text is available under the terms and conditions of the Creative Commons-Attribution ShareAlike (CC BY-SA) license; additional terms may apply. By using this site, you agree to the Terms and Conditions and Privacy Policy.

Upload a video for this entry

Information

Subjects: Others

Contributor MDPI registered users' name will be linked to their SciProfiles pages. To register with us, please refer to https://encyclopedia.pub/register :

HandWiki

View Times: 7.3K

Entry Collection: HandWiki

Update Date: 01 Dec 2022

Table of Contents

Notice

You are not a member of the advisory board for this topic. If you want to update advisory board member profile, please contact office@encyclopedia.pub.

Confirm

Only members of the Encyclopedia advisory board for this topic are allowed to note entries. Would you like to become an advisory board member of the Encyclopedia?

Yes

${ textCharacter }/${ maxCharacter }

Submit

Cancel

There is no comment~

${ textCharacter }/${ maxCharacter }

Submit

Cancel

${ selectedItem.replyTextCharacter }/${ selectedItem.replyMaxCharacter }

Submit

Cancel

Confirm

Are you sure to Delete?

Yes No

Video Upload Options

1. Definition

2. Useful Lemmas

2.1. Lemma 1: A*A = 0 ⇒ A = 0

2.2. Lemma 2: A*AB = 0 ⇒ AB = 0

2.3. Lemma 3: ABB* = 0 ⇒ AB = 0

3. Existence and Uniqueness

3.1. Proof of Uniqueness

3.2. Proof of Existence

1-by-1 matrices

Square diagonal matrices

General non-square diagonal matrices

Arbitrary matrices

4. Basic Properties

4.1. Identities

A+ = A+ A+* A*

A+ = A* A+* A+

A = A+* A* A

A = A A* A+*

A* = A* A A+

A* = A+ A A*

5. Reduction to the Hermitian Case

5.1. A+ = A* (A A*)+

5.2. A+ = (A* A)+A*

6. Products

6.1. A Has Orthonormal Columns

6.2. B Has Orthonormal Rows

6.3. A Has Full Column Rank and B Has Full Row Rank

6.4. Conjugate Transpose

7. Projectors and Subspaces

8. Additional Properties

8.1. Least-Squares Minimization

8.2. Minimum-Norm Solution to a Linear System

References

2.1. Lemma 1: A^*A = 0 ⇒ A = 0

2.2. Lemma 2: A^*AB = 0 ⇒ AB = 0

2.3. Lemma 3: ABB^* = 0 ⇒ AB = 0

A⁺ = A⁺ A^+* A^*

A⁺ = A^* A^+* A⁺

A = A^+* A^* A

A = A A^* A^+*

A^* = A^* A A⁺

A^* = A⁺ A A^*

5.1. A⁺ = A^* (A A^*)⁺

5.2. A⁺ = (A^* A)⁺A^*