Boosting algorithms are used for improving the performance of any given learning algorithm: Assuming that a given learning algorithm can construct a weak classifier that can classify data slightly more accurately than a classifier that randomly
Fig. 2.10 The typical architecture of a convolution neural network (CNN) appeared in 
determines the class of input data independently from the data, boosting algorithms construct a strong classifier that classifies much more accurately than the weak classifiers by combining them .
Assuming that a set of n training data is given and that the data can be classified into two classes, a simple boosting algorithm constructs a strong classifier from three weak ones as follows: Let a subset of the training data be denoted by D1 and a classifier constructed by the given algorithm be denoted by C1. Then, applying C1 to all of the training data, a new subset of the training data D2 is constructed such that half of D2 are misclassified by C1, with the other half classified correctly. Let a classifier constructed based on the new dataset D2 be denoted by C2. Next, applying C1 and C2 to all of the training data, the third data set, D3, is constructed by collecting data to which the two classifiers, C1 and C2, output different classes. Let a classifier constructed from D3 be denoted by C3. A boosting algorithm combines the three weak classifiers, C1, C2, and C3, to construct a strong classifier.
There are many boosting algorithms. The AdaBoost algorithm is one of the most popular ones. In the AdaBoost algorithm, weak classifiers can continue to be added until the performance becomes sufficiently strong. The resulting strong classifier is represented by a weighted sum of weak classifiers such that
where ak is a scholar and denotes the weight and hk(x) denotes a weak classification function constructed by the given algorithm. The k-th weak classifier is constructed from a set of data obtained by sampling from the training data according to the weight of each dataset, and the weight is computed according to the classification results of the k—1-th classifier: If the k— 1-th classifier classifies the j-th training data, Xj, correctly/incorrectly, then its weight, wj, is decreased/increased, respectively. It is known that the total training error of the resultant strong classifier can be arbitrarily lowered by combining a large number of weak classifiers.