# Boosting

Boosting algorithms are used for improving the performance of any given learning algorithm: Assuming that a given learning algorithm can construct a weak classifier that can classify data slightly more accurately than a classifier that randomly

Fig. 2.10 **The typical architecture of a convolution neural network (CNN) appeared in [49]**

determines the class of input data independently from the data, boosting algorithms construct a strong classifier that classifies much more accurately than the weak classifiers by combining them [34].

Assuming that a set of *n* training data is given and that the data can be classified into two classes, a simple boosting algorithm constructs a strong classifier from three weak ones as follows: Let a subset of the training data be denoted by *D** _{1}* and a classifier constructed by the given algorithm be denoted by

*C*

_{1}. Then, applying

*C*

*to all of the training data, a new subset of the training data*

_{1}*D*

*is constructed such that half of*

_{2}*D*

*are misclassified by*

_{2}*C*

_{1}, with the other half classified correctly. Let a classifier constructed based on the new dataset

*D*

*be denoted by C*

_{2}_{2}. Next, applying

*C*

*and*

_{1}*C*

*to all of the training data, the third data set, D*

_{2}_{3}, is constructed by collecting data to which the two classifiers,

*C*

*and C*

_{1}_{2}, output different classes. Let a classifier constructed from

*D*

*be denoted by C*

_{3}_{3}. A boosting algorithm combines the three weak classifiers, C

_{1}, C

_{2}, and C

_{3}, to construct a strong classifier.

There are many boosting algorithms. The AdaBoost algorithm is one of the most popular ones. In the AdaBoost algorithm, weak classifiers can continue to be added until the performance becomes sufficiently strong. The resulting strong classifier is represented by a weighted sum of weak classifiers such that

where *a _{k}* is a scholar and denotes the weight and

*h*denotes a weak classification function constructed by the given algorithm. The

_{k}(x)*k*-th weak classifier is constructed from a set of data obtained by sampling from the training data according to the weight of each dataset, and the weight is computed according to the classification results of the k—1-th classifier: If the

*k—*1-th classifier classifies the j-th training data,

*Xj,*correctly/incorrectly, then its weight, wj, is decreased/increased, respectively. It is known that the total training error of the resultant strong classifier can be arbitrarily lowered by combining a large number of weak classifiers.