# Gaussian Distribution

The Gaussian distribution or normal distribution is important, because it can be used as a model to simplify naturally complex phenomena. In the continuous case, the Gaussian distribution for a single random variable *x* is defined by

where д and a^{2} are the mean and the variance of the distribution, respectively. The inverse of the variance, X *=* 1/a^{2}, is called the *precision of the distribution.*

The Gaussian distribution for the random *D* variable, *x,* is given as
The quadratic form

is called the Mahalanobis distance between д and x. If E is an identity matrix, the Mahalanobis distance, A, is identical to the Euclidean distance between д and x. E is a symmetric and positive semideflnite matrix. Let h_{;} and A_{;} *(i =* 1,2, ...,D) denote the eigenvectors and their corresponding eigenvalues where A_{1} > A_{2} > ... > A_{d} > 0. Then, the following equation is satisfied:

A set of the eigenvectors, {h_{;}* ji =* 1,2,..., Dg, is the orthonormal basis of the space of *x* because E is symmetric and positive definite:

The covariance matrix can be constructed using these eigenvectors: and its inverse matrix is obtained thus:

Substituting Eq. (2.102) into Eq. (2.98) results in where

Fig. 2.3 **A level set of a Gaussian distribution**

Let *y = [ yi, y*_{2},... *,y _{D}]^{T}* and

*U = [u, u*; then,

_{2},..., u_{D}]^{T}

If all eigenvalues are positive, in other words, if the covariance matrix is positive definite, a set of points that locate at the same Mahalanobis distance from *д* forms an elliptic quadratic surface with its center located at д in a space of x. This is a level- set surface of the Gaussian distribution. As shown in Fig. 2.3, the directions of the axes of the surface are parallel to the eigenvectors, and their lengths are proportional to the corresponding eigenvalues.

Although the Gaussian distribution is widely used for density models, there are some limitations. Generally, a symmetric covariant matrix has *D(D +* 3)/2 independent parameters, and when *D* is large, it is not easy to estimate accurately a covariance matrix from the data of a limited number of training samples and to compute the precision matrix, i.e., the inverse of the covariance matrix. To avoid over-fitting to the training data, a graphical lasso, for example, can be employed [32] that uses a regularization technique for accurately estimating precision matrices. The estimation accuracy can also be improved by approximating a covariance matrix with smaller numbers of parameters. For instance, a diagonal matrix, ^_{diag} = diag(a^{2}), has only *D* parameters and can be used for approximating a covariance matrix that has in general nonzero off-diagonal components. Representing a covariance matrix with a diagonal matrix can avoid over-fitting, and its inverse can be computed more easily. This approximation, though, ignores the mutual correlations between different variables. Figure 2.4 shows examples of isocontours of the Gaussian distributions with covariance matrices approximated by a diagonal matrix (a) and by an isotropic matrix (b), ^_{iso} = a^{2}*I,* where *I* denotes a unit matrix.

Fig. 2.4 **Examples of isocontours of Gaussian distributions. (a) The covariance matrix is approximated by a diagonal matrix. (b) The covariance matrix is approximated by an isotropic matrix [33]**