Home Computer Science
APPLIED DATA ANALYTICS
Decision-making algorithms are playing the significant role in the implementation of the digitalization strategies in the different fields particularly in business analytics. Big Data analytics is an ecosystem of technologies that allows the collection, storage, and exploitation of large volumes of data that are generated at different speeds and have different varieties of information, both structured and unstructured (blogs, social networks, videos, images, etc.).
This setup allows to have a very flexible platform that operates as a unified repository of information that reduces costs and serves as the basis to give business solutions to a wide range of requirements (analysis, event correlation, exploitation, transformation, business intelligence [BI], and client 360) that allows to exploit Tb of information with thousands of operations per second and in real time.
Data scientists are the advanced analytics experts who give value to data. Through their work, companies can face new challenges, predict future situations, bet on the best alternatives, provide better services to customers, and maximize profits.
More and more Big Data projects are being developed by the three “Vs” (Volume, Variety, and Velocity), but do we know how the new General Data Protection Regulation (GDPR) impacts these systems?
Data analytics techniques extract relevant information of data and try to obtain any feature which would help for different purposes, for example, to model the data by extracting statistical characteristics. Once the behavior of a parameter is modelled, we can predict its value under a given situation.
The exploitation of large amounts of data, including personal data, using a set of technologies, systems, and algorithms, is booming, as it allows, in the face of an enormous volume of data, to extract valuable information for studies: retrospective, prospective, commercial projections, establishment of profiles, and usage patterns (both for statistical and scientific and commercial purposes), etc. The results of these analysis can have a direct impact on people, which is why they are increasingly becoming a matter of concern, and regulation is necessary to safeguard the privacy of people in Big Data models.
Big Data is an emerging trend, which is attracting the attention of scientists, industry, organizations, governments, and individuals. The motivations will be different, but the basis is the same: capability to achieve and store heterogeneous data to be processed in different ways to extract meaningful information.
Big Data refers to datasets that are not only big but also high in variety and velocity, which makes them difficult to handle using traditional tools and techniques (Elgendy and Elragal, 2014).
Since the beginning of the digital society, the information society, the availability of data, either captured in real time or stored in large databases, increases. Researchers both in social science and scientific science, and decision makers in general, are eager to manage amounts of data in order to refine or achieve more accurate results (e.g., medical diagnosis, people behavior, artificial intelligence [AI], system modelling, etc.).
"Die term Big Data regards more than just large amounts, but their heterogeneous nature, the availability of data coming from different sources, captured at different rates or speeds, are difficult to manage (Angierski and Kuehn, 2013). In consequence, innovative techniques must be provided in order to achieve a better performance. Current research trends point to the use of novel methods and techniques to manage such huge amount of data, which is in continuous and fast growth due to endless data generation, which must be extracted from raw data, preprocessed, managed and stored in databases, and finally processed to make decisions, in the framework of their final usage (Srinivasa and Mehta, 2014).
Those novel techniques should provide decision makers with valuable tools to extract the relevant information which seems to be hidden to traditional approaches, especially in those cases where the data has a high volatility, and the sophisticated algorithms must be fast and agile. Big Data analytics can be the appropriate toolkit aimed at providing the additional value to Big Data (Pyne et al., 2016).
As the Big Data problem is understood in greater depth, the definition of what we understand by Big Data becomes more precise, and greater the need for more appropriate tools for data analysis. However, the nuances of the Big Data definition are conditioned by the final application, the characteristics and properties of the data, and their nature and origin, so there will not be a standard definition applied in all cases, but they will have nuances depending on the problem to solve. We can take as an example the definition that authors propose regarding their work on quality in Big Data (Emmanuel and Stanier, 2016).
"Die three “Vs” of Big Data reflect the challenge that big companies face when it comes to giving data a value to make better decisions, improve operations, and reduce risks. Therefore, it is necessary to be able to navigate easily to obtain information both within the company’s systems and the data that arrives from outside.
If we analyze Big Data projects, they generally follow the following phases:
There are several final applications where Big Data-based techniques are being applied with different purposes. Among them, we can highlight the following:
In our case, we are going to apply the data analysis techniques to determine the cyber risk that a certain company has against possible external attacks on its computer systems and databases (Bartolini et ah, 2017a). Until now, this evaluation procedure was carried out through a questionnaire and required the consumption of human resources and time. The objective is to systematize this evaluation by reducing the impact of the subjective analysis on the person in charge of the evaluation in some critical factors, which can lead to inaccuracies in the evaluation result.