VARIOUS APPROACHES AND USE CASES TO BIG DATA OPERATION MANAGEMENT
Splunk Approach to Big Data Operations Management
Architecture of Splunk is based on the three basic principles (Harzog, 2015) (see Figure 7):
• Collection of voluminous data through various distributed sources. data is obtained from different sources having different data formats. It is important to transform the varying data format into the target one.
Figure 7. Splunk approach (Harzog, 2015)
- • Use of Map Reduce (MR) programming technique to process large data sets with a parallel and distributed algorithm. It consists of two phases:
- o Map Phase: It assigns the workload by dividing it into small parts and allocates tasks to Mapper, which handles each block of data.
- o Reduce: It combines the output of map phase to give the final output. MapReduce is a programming paradigm which can do parallel processing on nodes in a cluster. It takes input and gives the output in form of key-value pairs. MapReduce is able to achieve all this by simply dividing each problem into two broad phases, the map phase and the reduce phase. The Mapper processes each record sequentially and independently in parallel and generates intermediate key-value pairs.
- ? Map(k1, v1) ^ list(k2, v2) The Reduce phase takes the output of the Map phase. It processes and merges all the intermediate values to give the final output, again in form of key-value pairs.
- ? Reduce(k2, list (v2)) ^ list(k3, v3) The output gets sorted after each phase, thus providing the user with the aggregated output from all nodes in an orderly fashion (see Figure 8).
- • Firing appropriate query according to our needs to gain insights .The knowledge which is gained can be used for making effective decision and to identify actionable patterns and trend.
Splunk has the ability of collecting large amount of real time data in a scale out manner which can be used for searching, monitoring and analyzing. It helps to understand the meaning of data facilititating future investigation. Splunk also helps to visualize the data through charts and reports for better understanding and makes the insights more clearer and refiner. The results are then interpreted and evaluated to discern new patterns and trends.
Figure 8. Map reduce programming framework
Key features of Splunk are:
- • Indexing: In this, administrator collects maximum data from different sources and collects it at centralized location with indexes for making them searchable centrally.
- • Logs: With the help of indexes, splunk can look out for the logs from different data centers and can indicate when a problem could occur.
- • Drill Down: Splunk then uses drill down approach to determine the root cause of the problem and can generate alert for the future.
This approach helps to predict the behavior of the customer purchase. This analysis helps to understand what customer wants to purchase, where they want to go, what they want to eat etc. So that valuable insights can be converted into actions. The knowledge thus gained helps in understanding the needs of every customer individually so that it becomes easier to do the business with them. This is the revolutionary change to build a customer-centric business. To build a customer centric business an organization must be observant about what customer is doing, must keep a record about what customer is purchasing and lastly should discover the insights to maximum the profit for customer.
E-commerce generally requires:
- 1. Active participation from users
- 2. A simple way of representing users’ interests to the software system
- 3. An algorithm to match people with similar interests.
Users with similar rating patterns are identified are clubbed together into classes, so that when one of them is in need of a recommendation, it can be provided using the data of the users in the class. The products matching the preferences of the user are then recommended, based on the data in the ratings matrix. This approach is used by algorithms, which use user-based collaborative filtering for building the recommendation system. A specific application of this is the user-based Nearest Neighbor algorithm.
Alternatively, there is item-based collaborative filtering, which identifies like- minded people from the list of costumers who avail a certain product. It uses an item-centric approach for collaborative filtering using the following basic steps:
- 1. An item-item matrix is built to determine the relationships between products on the basis of user buying patterns.
- 2. When the user searches a product, using the previously built matrix identifies the preferences and relevant recommendations are provided.
Relying on scoring or rating systems is not exactly ideal in tasks where there is a large variation in interests and a lot of users’ interests may be idiosyncratic. This is so because rating which is averaged across all users ignores demands specific to a particular. This is particularly common in the music and movie industries. However, the developers may use alternative methods to combat information explosion, such as data clustering. Therefore, there is another form of collaborative filtering that does not rely on imposition of artificial behavior by a rating or binary system but uses implicit observations of the behavior of common user for filtering. Such systems log the user’s behavior based on previous purchases to identify a pattern, which can be used to group that user with a class of users with similar behavior. This can then be used to provide recommendation, predict future purchases and can even be used to predict a user’s behavior in a hypothetical situation, which can be essential for decision support systems. Analyzing each one on the basis of business logic, which helps to determine how a prediction might affect the business system, then filters these predictions. For example, it is not useful to recommend a user a particular piece of art if they already have demonstrated that they own that artwork or a similar artwork.