Vectors and Matrices
Vector and matrix notation is a convenient and succinct way to represent dens data structures and complex operations. It’s application is called matrix algebra, but should be viewed simply as an extension of traditional algebra, as opposed to an alternative. Matrix notation is suitable for a range of scientific and mathematical branches like linear algebra, multivariate analysis, economics and machine learning. Its forté is its ability to abstract individual variables and constants into larger array structures. This lets us avoid the mess of using a host of indices on individual variables. The following is a (very) brief introduction to some of the more useful parts of matrix notation.
Data Structure
A vector \(\vec{v}\in \mathbb{R}^{n}\) represents an ordered list of individual variables \(v_i\in \mathbb{R}\). The index \(i\in \{1, \cdots, n\}\) denotes where a particular variable fits within a vector. By default the variables are considered to be listed vertically.
\begin{equation}
\label{eq:vector}
\vec{v} =
\begin{bmatrix}
v_{1} \\
\vdots \\
v_{n} \end{bmatrix}
\end{equation}
A vector with this configuration is called a column vector. The orientation of the a vector matters when it comes to performing operations on them. The following illustrates the transpose operation T, where \(\vec{v}\) is flattened.
\begin{equation}
\label{eq:transposedVector}
\vec{v}^\mathrm{T} =
\left[ \begin{array}{ccc}
v_{1} & \cdots & v_{n}
\end{array} \right]
\end{equation}
A neat way to represent a column vector is thus as follows.
\begin{equation}
\vec{v}=[v_1 \dots v_n]^\mathrm{T}
\end{equation}
An extension of vector notation is the more expressive matrix notation. A matrix \(A\in \mathbb{R}^{m\times n}\) is a rectangular array, or table, of single variables \(a_{ij}\in \mathbb{R}\). The first index \(i\) represents a row in the matrix, whereas the second index \(j\) denotes a column. An interpretation is that a matrix is composed of multiple vectors representing either rows or columns.
\begin{equation}
\label{eq:matrix}
\begin{array}{ccl}
A
&=&
\begin{bmatrix}
a_{11} & \cdots & a_{1n} \\
\vdots & \ddots & \vdots \\
a_{m1} & \cdots & a_{mn}
\end{bmatrix}
=
\begin{bmatrix}
\vec{a}_{1}^\mathrm{T} \\
\vdots \\
\vec{a}_{m}^\mathrm{T}
\end{bmatrix}
,\quad\\
&&
\vec{a}_{i}^\mathrm{T}
=
\begin{bmatrix}
a_{i1} & \cdots & a_{in}
\end{bmatrix}
,\quad i \in \{1, \cdots, m\}
\end{array}
\end{equation}
\begin{equation}
\label{eq:alternativeMatrix}
D
=
\begin{bmatrix}
d_{11} & \cdots & d_{1n} \\
\vdots & \ddots & \vdots \\
d_{m1} & \cdots & d_{mn} \end{bmatrix}
=
\begin{bmatrix}
\vec{d}_{1} & \cdots & \vec{d}_{n}
\end{bmatrix}
,\quad
\vec{d}_{j} =
\begin{bmatrix}
d_{1j} \\
\vdots \\
d_{mj} \end{bmatrix}
\end{equation}
Conversely you could interpret a vector as a matrix with only one column or row, hence the terms column and row vectors.
Arithmetic Operations
There are multiple ways to perform multiplication of vectors and matrices. As with simple algebraic notation, two adjacent vectors have an implicit multiplication operation called the dot product.
\begin{equation}
\label{eq:dotProduct}
\vec{v} \cdot \vec{u} = \vec{v}^\mathrm{T}\vec{u} =
\begin{bmatrix}
v_{1} & \cdots & v_{n}
\end{bmatrix}
\begin{bmatrix}
u_{1} \\
\vdots \\
u_{n} \end{bmatrix}
= v_1u_1 + \cdots + v_nu_n
\end{equation}
The vectors must be of equal dimensions, where each ordered pair of single variables are multiplied then summed.
General matrix multiplication is an extension, or generalization, of the dot product. Using the dot product, each row vector of the first matrix \(A\in \mathbb{R}^{m\times n}\) is multiplied with every column vectors of the second matrix \(B\in \mathbb{R}^{n\times k}\).
\begin{equation}
\label{eq:matrixMult}
\begin{array}{ccl}
AB & = &
\begin{bmatrix}
a_{11} & \cdots & a_{1n} \\
\vdots & \ddots & \vdots \\
a_{m1} & \cdots & a_{mn}
\end{bmatrix}
\begin{bmatrix}
b_{11} & \cdots & b_{1k} \\
\vdots & \ddots & \vdots \\
b_{n1} & \cdots & b_{nk}
\end{bmatrix}\\
& = &
\begin{bmatrix}
\vec{a}_{1}^\mathrm{T}\vec{b}_{1} & \cdots & \vec{a}_{1}^\mathrm{T}\vec{b}_{k}\\
\vdots & \ddots & \vdots \\
\vec{a}_{m}^\mathrm{T}\vec{b}_{1} & \cdots & \vec{a}_{m}^\mathrm{T}\vec{b}_{k}
\end{bmatrix}
\end{array}
\end{equation}
Notice how the inner dimensions \(n\) of the matrices must agree. That is, each row vector of matrix \(A\) must have the same dimension as the column vectors of matrix \(B\). The resulting matrix \(AB\) is thus of m-by-k dimensions where each entry is defined as follows.
\begin{equation}
\label{eq:matrixRntryMult}
\begin{array}{c}
\vec{a}_{i}^\mathrm{T}\vec{b}_{j}
=
\begin{bmatrix}
a_{i1} & \cdots & a_{in}
\end{bmatrix}
\begin{bmatrix}
b_{1j} \\
\vdots \\
b_{nj}
\end{bmatrix}
=
a_{i1}b_{1j} + \cdots + a_{in}b_{nj},\\
i\in \{1, \cdots, m\},\quad j\in \{1, \cdots, k\},
\end{array}
\end{equation}
A special case of the matrix multiplication is when \(k=1\), or rather the multiplication of a matrix \(A\in \mathbb{R}^{m\times n}\) to a vector \(\vec{b}\in \mathbb{R}^{n}\). This yields a special case where the dot product of vector \(\vec{b}\) and each row vector of \(A\) produces the vector \(A\vec{b}\in\mathbb{R}^{m}\).
\begin{equation}
\label{eq:matrixMultVsVector}
A\vec{v} =
\begin{bmatrix}
a_{11} & \cdots & a_{1n} \\
\vdots & \ddots & \vdots \\
a_{m1} & \cdots & a_{mn}
\end{bmatrix}
\begin{bmatrix}
v_{1} \\
\vdots \\
v_{n}
\end{bmatrix}
=
\begin{bmatrix}
\vec{a}_{1}^\mathrm{T}\vec{v}\\
\vdots \\
\vec{a}_{m}^\mathrm{T}\vec{v}
\end{bmatrix}
=
\begin{bmatrix}
a_{11}v_1 + \cdots + a_{1n}v_n\\
\vdots \\
a_{m1}v_1 + \cdots + a_{mn}v_n
\end{bmatrix}
\end{equation}
In the field of computer science a second more intuitive matrix and vector multiplication is often used. The entrywise product, also known as the Hadamard product, applies normal algebraic multiplication of every corresponding entry pairs of two matrices or vectors of identical dimensions.
\begin{equation}
\label{eq:entrywiseProduct}
\begin{array}{ccl}
A\circ D &=&
\begin{bmatrix}
a_{11} & \cdots & a_{1n} \\
\vdots & \ddots & \vdots \\
a_{m1} & \cdots & a_{mn}
\end{bmatrix}
\circ
\begin{bmatrix}
d_{11} & \cdots & d_{1n} \\
\vdots & \ddots & \vdots \\
d_{m1} & \cdots & d_{mn}
\end{bmatrix}\\
&=&
\begin{bmatrix}
a_{11}d_{11} & \cdots & a_{1n}d_{1n} \\
\vdots & \ddots & \vdots \\
a_{m1}d_{m1} & \cdots & a_{mn}d_{mn}
\end{bmatrix}
\end{array}
\end{equation}
An exception to the dimensionality requirements discussed above is the multiplication of a constant, or weight, to a vector or matrix. In this case the weight is simply multiplied to all entries.
\begin{equation}
\label{eq:constantVectorProduct}
c\vec{v} =
c
\begin{bmatrix}
v_{1} \\
\vdots \\
v_{m}
\end{bmatrix}
=
\begin{bmatrix}
cv_{1} \\
\vdots \\
cv_{m}
\end{bmatrix}
\end{equation}
Unsurprisingly, addition and subtraction also behave entrywise as in the vector case below.
\begin{equation}
\label{eq:additionAndSubtractionVector}
\vec{x} \pm \vec{y} =
\begin{bmatrix}
v_{1} \\
\vdots \\
v_{m}
\end{bmatrix}
\pm
\begin{bmatrix}
u_{1} \\
\vdots \\
u_{m}
\end{bmatrix}
=
\begin{bmatrix}
v_{1}\pm v_{1} \\
\vdots \\
v_{m}\pm v_{m}
\end{bmatrix}
\end{equation}
Although, for addition and subtraction the dimensions must agree.
Vector Norms
An important class of vector and matrix operations is the norm, which is denoted by some variation of \(||\cdot||\). Both vectors and matrices have corresponding norm concepts however only norms of vectors are needed. There exists many types of norms, where they all have the common trait of being a measure of length or size of a vector. The most notable norm is the euclidean norm.
\begin{equation}
\label{eq:euclNorm}
||\vec{v}||_2 = \sqrt{\vec{v}^\mathrm{T}\vec{v}} = \sqrt{v_1^2 + \cdots + v_n^2}
\end{equation}
This is often used to calculate the euclidean distance between two points.
\begin{equation}
\label{eq:euclDist}
d(\vec{v},\vec{u})= ||\vec{v-u}||_2 = \sqrt{(\vec{v}-\vec{u})^\mathrm{T}(\vec{v}-\vec{u})} = \sqrt{\sum_{i=1}^n{(v_i-u_i)^2}}
\end{equation}
A generalization of the euclidean norm is the p-norm.
\begin{equation}
\label{eq:pNorm}
||\vec{v}||_1 = \left(\sum_{i=1}^n{v_i^p}\right)^\frac{1}{p}
\end{equation}
Besides the euclidean norm, the most important p-norms are the manhattan norm with \(p=1\) and the infinity norm with \(p=\infty\).
\begin{equation}
\label{eq:1Norm}
||\vec{v}||_1 = \left(\sum_{i=1}^n{v_i^1}\right)^\frac{1}{1} = \sum_{i=1}^n{|v_i|}
\end{equation}
\begin{equation}
\label{eq:inftyNorm}
||\vec{v}||_\infty = \max\{v_1, \cdots, v_n\};
\end{equation}
Linear and Quadratic Equations
Matrix notation allows for succinct expression of linear and quadratic systems of equations. The set of linear equations \(y_i = a_{i1}x_{i1} + \cdots + a_{in}x_{in} + b_i = 0,\; i\in \{1, \cdots, m\}\) can be written as follows.
\begin{equation}
\label{eq:linearEquations}
\vec{y} = A\vec{x} + \vec{b} = \vec{0};
\end{equation}
The quadratic equation \(y = \sum_{i,j=1}^n{a_{ij}x_ix_j}+b=0\) can be written as follows.
\begin{equation}
\label{eq:quadraticEquation}
y = \vec{x}^\mathrm{T}A\vec{x} + b = 0;
\end{equation}
Miscellaneous Structures and Operations
Sometimes it is useful to have a way to represent a vector of identical entries. The null vectors \(\vec{0}\) and one vectors \(\vec{1}\) are vectors with entries of only \(0\) and \(1\) respectively. Usually the dimension of such a vector is implied by the context. An application of the one vector is that it can be used to sum the entries of a vector by \(\vec{1}^\mathrm{T}\vec{x} = x_1 + \cdots + x_m\).
\begin{equation}
\label{eq:nullOneVector}
\vec{0} =
\begin{bmatrix}
0 \\
\vdots \\
0
\end{bmatrix},
\quad\vec{1} =
\begin{bmatrix}
1 \\
\vdots \\
1
\end{bmatrix}
\end{equation}
The \(\mathrm{diag}(\cdot)\) operator transforms a vector \(\vec{v}\in\mathbb{R}^{n}\) into a diagonal matrix \(V\in\mathbb{R}^{n\times n}\) by putting each entry of the vector into the diagonal of the matrix while keeping other entries zero.
\begin{equation}
\label{eq:diagVectorToMatrix}
\mathrm{diag}(\vec{v}) = \mathrm{diag}
\begin{bmatrix}
v_1 \\
\vdots \\
v_n
\end{bmatrix}
=
\begin{bmatrix}
v_{1} & \cdots & 0 \\
\vdots & \ddots & \vdots \\
0 & \cdots & v_{n}
\end{bmatrix}
=
V
\end{equation}
An often used matrix concept is the identity matrix defined as \(I_n = diag(\vec{1})\). It has the property that multiplication with it does not have any effect, e.g., \(AI_n=A\), and can be viewed as a cousin of \(1\) in normal arithmetic.
Finally augmentation of vectors and matrices can be needed for application of machine learning and optimization software. Augmentation simply means that we concatenate two object of equal dimensions. For instance, a vector \(\vec{v} = [v_1, \cdots, v_n]^\mathrm{T}\) may be augmented by a new variable \(v_{n+1}\) becoming \(\vec{v}’ = [v_1, \cdots, v_{n+1}]^\mathrm{T}\). Similarly a matrix \(A\in\mathbb{R}^{m\times n}\) may be augmented by another matrix, say, \(E\in\mathbb{R}^{m\times k}\) as follows.
\begin{equation}
\label{eq:matrixAugmentation}
\begin{bmatrix}
A & E
\end{bmatrix}
=
\begin{bmatrix}
a_{11} & \cdots & a_{1n} & e_{11} & \cdots & e_{1k} \\
\vdots & \ddots & \vdots & \vdots & \ddots & \vdots \\
a_{m1} & \cdots & a_{mn} & e_{m1} & \cdots & e_{mk}
\end{bmatrix}
\end{equation}
Another example where we augment the matrix with a vector could be as follows.
\begin{equation}
\label{eq:matrixVectorAugmentation}
\begin{bmatrix}
A \\
\vec{v}^\mathrm{T}
\end{bmatrix}
=
\begin{bmatrix}
a_{11} & \cdots & a_{1n} \\
\vdots & \ddots & \vdots\\
a_{m1} & \cdots & a_{mn}\\
v_{1} & \cdots & v_{n}
\end{bmatrix}
\end{equation}
Pingback: blog.andersen.im | Practical Feature Reduction Using SVD in R