# Probability and Statistics: Foundations of CA

Probability and statistics are essential to CA, in which targets are represented and, in given images, are described by their statistical models. A framework of probability theory is employed for constructing the statistical models and for describing the targets in given images. In this subsection, some basics of the probability theory are described.

## Sum Rule and Product Rule of Probability

First, discrete probability is described. In this case, the random variable *X* takes discrete values such as *x _{1}, x_{2},..., x_{n}.* If the frequency in the case that

*X*takes

*Xi*is

*c*the probability that

_{i},*X*takes

*x*is represented as

_{i}

where *N = J2”=i c _{i}.* When there is another random variable

*Y*that represents another aspect of the event mentioned above,

*Y*takes values of y

_{1;}

*y*

_{2}*,...,y*The frequency in the case that

_{m}.*Y*takes

*yj*is

*dj*, and the frequency in the case that

*X*takes

*xi*and

*Y*takes

*yj*simultaneously is

*r*. The joint probability is described as

_{i}jand

Then, the following equation is derived from Eqs. (2.76), (2.77), and (2.78):

This equation is called the *sum rule of probability.* Because the left side of Eq. (2.79) means marginalization in terms of the random variable Y, it is called a *marginalprobability.* Assuming that *X* is fixed on x_{;}, the ratio of frequency of the case of *Y = yj* to the cases of all *Y* is described as *p(Y = yjX = xi).* This is called a *conditional probability,* because this is the probability of *Y = yj* under the condition of *X = xi*. This conditional probability is calculated as

By substituting Eqs. (2.76) and (2.80) into Eq. (2.77), the equation is transformed as

This equation is called the *product rule of probability.* The probabilistic distribution of the random variable *X* is denoted as p(X), and the probability in the case that *X* takes a specific value is denoted as p(x,). By using these notations, the sum rule and the product rule of probability are written thus: The sum rule of probability is represented as

and the product rule of probability is represented as

Because *p(X, Y*) is symmetrical with respect to the random variables *X* and Y, *p(X,* Y) = *p(Y,*X) or *p(YX)p(X) = p(XY)p(Y).* By transforming this equation, the relationship between two conditional probabilities is derived:

This relationship is referred to as *Bayes’ theorem.* By substituting Eq. (2.83) into Eq. (2.82) after swapping X with *Y* on the right side of Eq. (2.83), the denominator of the right side of Eq. (2.84) is written as

Assuming the necessity of finding the probabilities of the random variable *Y* in Eq. (2.84), *p(Y*) and *p(Y*|X) are referred to as a *prior probability distribution* (or simply, a “prior”) and a *posterior probability.* These probabilities are so named because the former is a known probability before the actual value of the random variable *X* is known, and the latter is known after the actual value of the random variable *X* is known. When the joint probability *p(X, Y*) is equal to a product of *p(X*) and *p(Y*), i.e., *p(X, Y*) = *p(X)p(Y*) holds, the random variables *X* and *Y* are independent. In this case Eq. (2.84) (the product rule of probability) is transformed thus:

and hence *p(Y*) = *p(Y*|X). This means that the probability of *Y* is unaffected by X, and vice versa. When the random variables have continuous value, Bayes’ theorem is described thus:

where *x* and *в* are random variables, *p* is a posterior probability density function, *q* is a prior probability density function of *в*, and *f* is a likelihood function. The denominator of the right-hand side of Eq. (2.87) is a marginal probability density function.