Home Computer Science
Table of Contents:
STANDARD NEURAL NETWORK AND CLASSIFICATION CRITERION
All vision problems generally require many complex phases up to final solution of any particular research work. Table 10.1 presents basic classification strategies that are referred as core machine learning. The structure of the basic neural network topology is represented in Figure 10.3. Kumar [27,28] presented detail architecture of the important deep networks architecture and large-scale data analytics issues, which are shown in the next subsections.
10.4.1 DEEP LEARNING SPECIFIC COMPONENTS
hi this section, the specific components of deep network structure are described. Basic features of these components discriminate the conventional neural network to deeply learning neural network in the context of exploiting the complex feature space of the visual objects and resolve the severe computational issues with such feature space.
The main component of a deep neural network is the CNN, which focus on convolution as major operation of a large number of matrices. The intelligent system with the CNN can efficiently process complex features. In this reference, apart from the CNN, the vital technology includes the RNN, long short-term memory, autoencoders, and restricted Boltzmann machine. In this chapter, we primarily focus on the brief architecture of the CNN and describe its performance to solve video processing tasks. In general, every CNN model contains four basic building blocks of convolution, sampling, nonlinearity unit used in CNN, and Hilly connected layers used for feature classification.
TABLE 10.1 Machine Learning Algorithms and Its Feature Processing
10.4.2 NONLINEARITY IN DEEP NETWORK
This refers to introduction of a class of thresholding in the layers of traditional neural network. The layers of nonlinearity are introduced by utilizing the activation or sigmoid function. In a real-life scenario, almost the problems are nonlinear, which are incorporated by introducing by the same nature of activation function to get threshold for the CNN. The most basic thresholding functions are rectified linear unit (ReLU) [8,9], logistic function, and exponential linear unit . In general, all the activation functions work on pixelwise operations. The literature of deep learning evidences that several advanced versions of ReLU are more powerful. The examples include modified ReLU, leaky ReLU, parametric ReLU, and randomized leaky ReLU represented in (10.2); (10.3) outperformed the state of the art
10.4.3 CONVOLUTION FILTERS
The activation of neural network from convolution layers mitigates fine-tuning issues and maps the higher dimensional data space to lower dhnensional data space. The convolution process by a 3'3 kernel with image is basically characterized by three parameters. The number of convolution filters used in the CNN determines the depth of CNN. Other basic components of the CNN include stride and zero padding. Stride refers to the number of pixels that jump during one convolution. In the third component, zero padding is the mechanism that provides the boundary pixel to involve in the convolution process. The basic convolution filters include Gaussian, Laplacian, Sobel, and box filter for the basic building block of filters. It is customary that selecting more number of filters gives better training to the deep network provided that computational tools are sufficient to take care the processing overhead. Apart from convolution, there are many more conceptual layers are referred to introduce for developing a deep network. These layers may be pooling layers, batch normalization layers, drop out, and frilly connected layers. All the layers correspond to different operation in deep network, and introducing these layers in the network depends on the particular objectives.
STANDARD DEEP NETWORKS
In this section, we discuss some standard deep networks, which ensure the success of deep learning methods applicable to many engineering and science discipline.
10.5.1 LENET (1990)
hi 1990, when there was no sound in research community for deep learning, Yami LeCirn developed a veiy fust convolution neural network. This is edge of deep learning get break though with specific domain of optical character recognition. The basic LeNet architecture is given in Figure 10.4, which presents the discrimination of visual objects at the prediction layer. This network provides the base to all the modem deep neural networks, which need to have cascading of convolution and pooling and nonlinearity layers. ReLU [15,16] is generally applied before pooling and frilly connected layers of the network.
10.5.2 ALEXNET (2012)
The architecture of AlexNet exploits details of convolution neural network blocks developed in LeNet. Noticeably, only difference is the number of filters used for reducing dimensionality between various pooling layers. The details of the network are represented in . The evidence from ImageNet challenge reported that the network is trained on two NVidia graphics card “GTX 580,” with over 1.2 million sample images of the large dataset. For purely classification, the training of such data sample takes five to six days. The network uses five coevolutions and trains 60 million parameters and 6.5 lakh neurons.
10.5.3 ALEXNET (2012)
Zeiler and Fergus developed a deep network with then name (ZFNet) in 2013 that exploits the intermediate functionality of classification methodology inside the deep network layers. They tweaked the complete architecture proposed in AlexNet convolution neural network. ZFNet demonstrated the state of the art on Caltech 101 and Caltech-256 benchmarks. By training with ImageNet  for classification on GTX 580 for 12 days, we developed features of pixel maps as opposing the convolution layers. In the experiential phase of the network, activation and error operations are performed by the ReLU and the cross-entropy loss function. In this network, error computation and action operations are performed by cross-entropy loss and ReLU, respectively. During the classification process on the ImageNet benchmark, the drop-out approach is utilized to achieve regularization.
10.5.4 GOOGLENET (2014)
The massive data of real-world scenario pertain huge numbers of parameters, which mark a black spot at the success of earlier deep network models. The number of parameters used in GoogleNet was only 4 million, whereas Alexnet used 60 million parameters. The reduction of such a large of number parameters was possible by introducing an inception model . The architecture of the inception model is represented in Figure 10.4.
FIGURE 10.4 Inception module of the GoogleNet architecture .
The inception model of the GoogleNet set benchmark in detection and visual recognition literature. From the evidences of several experiments on ImageNetl4, this is concluded that the inception model with its successive versions, that is, inception 3 and inception 4, is a fast and suitable model to resolve the issues complex visual analytics. In this model, the Hebbian principle is adopted to get the better optimization control and results in a model with only 22 layers. The mechanism of the inception model is motivated by selection of the size of convolution filters and concatenation them. The mechanism of convolution used in inception model gets the CNN free from overfitting. The overfitting layers are renamed as global average pooling layers.
10.5.5 VGGNET (2014)
The issue of a large number of layers as required for handling huge number of parameters in large-scale data models was considered a severe problem till 2014. Therefore, this model is highly sounded to deal with large-scale data statistics without bothering about the number of layers used in the network.
The experimental setup for training VGGNet is developed with four Titan Black GPUs and took three to four weeks to accomplish the training phase. This network takes 21-28 days to train with two GPUs “Titan Black.” The model used Caffe toolbox as background and training data is optimized by utilizing stochastic gradient descent (SGD) scheme. The experimental observation [11,13] reports presented in Table 10.2 show that there is only 7.5% error in the validation set on the model with 19 layers and 7.3% error in the test set on top five classification layers.
After the classical performance of VGGNet on large-scale data, it is assumed to develop big network results in higher performance. Such deeper neural networks outperform better with extensive data, but training such a network is a highly tedious job. The main credit is given to Kaiming H, for resolving the processing issue with the deeper network, in ILSVRC 2015 challenge.
The layers used in the network were modified by learning residual functions . This function optimizes the computations and achieves higher accuracy. From the experiments, it is observed that batch normalization also fails to reduce validation and training error while introducing the extra layer to the network. In the ResNet inception model presented in Figure 10.5, this problem is solved by introducing bypass to the summing up the layer with the CNN. The model took 21 days to process 152 layers with the ImageNet benchmark by utilizing eight GPUs.
TABLE 10.2 The Comparative Classification Performance of the ResNet Model
FIGURE 10.5 Inception module of the ResNet model.
hi this chapter, the prime concern is represented to the processing issues with heterogeneous contents in the visual media of real life. For resolving the highlighted issues, deep data analytics and graphical processing algorithms are needed to develop for extracting and evaluating the information from real-life visual media. Training an optimized deep network with graph-based pixel-level processing is a better choice than handicraft feature engineering for local and global feature extraction. A visual domain generally consists of real-time multimedia activities, which are still an open research problem regarding the semantic understanding of poorly retrieved information from a complex environment. Another issue with the processing of complex information is reported as big data evolution from live streaming of video sequences. In such a case, without introducing big data analytics, the experimental issues can degrade the performance of conventional machine algorithms. Furthermore, five components of big data issues can cause to fail every conventional machine learning technique to retrieve 100% exact information. Therefore, to exploit deep network architecture, it is very customary to develop optimized deep data analytics system and high-performance algorithms. In brief, this work raises the future scope of developing deep neural networks with some optimized algorithms that can give better performance to compute the unstructured visual data with large scale analytics.
10.6.1 FUTURE RECOMMENDATIONS
The CNN has been performed successfully for solving an image classification problem. However, in the case of video-based experiments on big data, developing a big network does not guarantee higher performance. This motivates that processing of large-scale visual media requires to build up the mechanism for every layer in the network. Such high-level processing cannot be expected without a rich source of processing devices. Therefore, to compromise with costly computational resoitrces and time economy, this is necessarily required to develop an optimized deep network for extracting information from unconstraint visual media. The conclusive remark for future issues this sounds the absence of an adaptively fast and small deep network. Although stacking with more layers and effective subsampling can increase the size of receptive field but central receptive field of each neuron does not participate equally. The throughout study of this work concludes that many issues such as SGD, graphs, and Riemarmian-rnanifold-based deep learning can still be needed to practice for a better solution to open challenging problems in social media communication, sensor network of neuron for analysis of the human brain, stock market for financial evaluations, and geographical study based on a network of satellites. In the review work presented in , the main problem for machine learning with graphs is highlighted as the information association between the nodes. The encoding and decoding scheme with the embedding of graphs is helpful for informatics. Recently observed state of the art presents scalability and interpretability in temporal graphs as open issues .