Data Mining and Statistical Analysis in Industrial Processes

Data mining and statistical analysis techniques are helpful in finding interesting patterns in the plant operation [3]. As the technology evolved to collected millions of data over a very short time, the analyses of such data are still in development though [2]. Computer control does not rely only on the classical digital control techniques anymore, but also on the tons of data produced by huge sensor networks. This calls for the data mining techniques to find useful information in this huge dataset.

Pattern Recognition

In industrial processes, there are a number of data mining applications to formally define the process patterns. Wang et al. [10] exploited data mining to predict,

Control Variables Histogram

Considering the process as an N-dimensional space, rules define a subspace wherein each variable will have a new statistical distribution, in a histogram form. By classifying the variables into control (U) and observation (Y), and assuming that the control will reflect after some time on the observation, we can apply joint probability distribution (JPD) to statistically determine the relation between control and observation:

Where P(Y) is a probability for the vector of observations Y to have a certain set of values, P(Y|U) is the conditional probability of the observations Y to have a set of values given that U has a set of control values. The conditional probability can be interpreted as the value of Y depends or is influenced by the value of U. In a complex system, this may be modelled as a Bayesian Network, which is a diagram of causation theoretically design by an expert in the field of application. Deventer et al. [15] show an example of a Bayesian Network applied in control of dynamic process (Fig. 3).

The probability distributions (PD) are determined based on the process history in the form of histograms. By applying one rule (filter or slice in the n-dimensional

optimize and diagnose parameters in oil field. Namikka and Gibbon [2] applied data mining for exploratory analysis and process modelling making use of multivariate statistical methods for dimensionality reduction. Mere et al. [11] reviewed the typical environments where data mining is applied, and presented a series of tools to deal with outliers in the galvanized steel production. Sadoyan et al. [12] sought patterns in manufacturing process control using Rough Sets (RS) for identifying if-then type rules.

Regarding process control, artificial intelligence techniques like Fuzzy Systems and Neural Networks are used in the works of Yang et al. [9] and Mere et al. [11], however these techniques have a complex implementation and require large computational time [12]. In addition, neural networks are black box, i.e. one does not know how the results are produced [8]. Techniques like Association Rules [13] and Decision Trees [14] are able to find relevant patterns in the process in a clearly understandable text form.

Fig. 3 Schema of a multivariate Bayesian Network with 2 controls and 4 observations

Fig. 4 Scatter plot of two variables where the desired range points are separated by slicing the dataset

Fig. 5 A decision tree that separates the desired from the undesired range in the previous dataset, considering variables Pressure Drop (PD) and Temperature at 75 m (T75)

data hyperspace), the PD’s change until we drill down to a single record containing only one single value for each variable.