Home Computer Science
Table of Contents:
A Convergence of Mining and Machine Learning: The New Angle for Educational Data Mining
PRATIYUSH GULERIA1’ and MANU SOOD2
National Institute of Electronics and Information Technology, Shitnla, Himachal Pradesh 171001, India
2 Department of Computer Science, Himachal Pradesh University, Shimla, Himachal Pradesh 171005, India
In the digital era, there is an accumulation of large volume of data, and the biggest challenge being faced by humans is to derive meaningful information from these data. In such a scenario, data mining techniques become important to unearth the hitherto unknown relationship from data. There is an urgent need for scientific research in education, as it aimed at enhancing student’s cognitive learning and social development. With increasing educational institutions, the large amount of data is accumulated, which is unstructured and not useful neither for students nor for teachers. Therefore, the major stress is on improving the quality of learning in schools and institutions, and there is a need to make significant progress on access to schooling and quality of learning in students. The computational intelligence is achieved through data mining techniques, which can break down information to enable it to make predictions. This chapter aims at discussing the utilization of educational data mining, learning analytics, and predictive intelligence techniques in educational field and models of machine learning, that is, unsupervised and supervised learning. There are some inbuilt libraries, such as Tensor flow', Keras, and Python libraries, discussed in this chapter, which helps in building, training, and applying it into educational data classification.
Educational data mining is one of the most encouraging fields for predicting the educational trends with the inclusion of the latest technologies into the system. It is the field using data mining techniques in educational environments . The inclusion of e-leaming modes, smart learning analytics, and online learning resources in the traditional teacher-taught model results in an accumulation of large volume of data. The data collected may be unstructured, semistmctured, or structured, which become a challenge to the learners as well as educational institutions in quality assurance of education and improving decision-making capabilities of management. An equally important factor that is of grave concern to the students' community at large is to choose appropriate academic courses and industrial training programs helpful in furthering their career/job prospects.
To handle the situation of big data, there is emergent need of educational data mining, which helps in converting unstructured data to structured data and disseminating meaningful information as well as knowledge.
Machine learning is a computer science field, where statistical techniques give computer system learning capability and improve the performance of a particular task continuously without being explicitly programmed. Machine learning involves cognitive and computational approaches. This field illustrates the benefits of collaboration between scientists from psychology and computer science . With the help of a machine learning approach, education and personal training can be enhanced, and the following outcomes can be achieved: (a) students’ dropout in distance learning can be prevented, (b) prediction of students those who can drop the course, (c) predicting students learning behavior, (d) innovative pedagogical practices, (e) clustering of students according to their learning speed, (f) predicting industry-oriented courses, and (g) real-time analysis of students feedback. There are many tools that can be used frequently for data mining and learning analytics hr the educational field. The tools helped extract meaningful information from the student's dataset and find new observations related to students’ learning activities hr the educational system. It helps in (a) improving students’ retention rate, (b) maximize educational improvement ratio, and (c) enhance students’ learning results .
Data mining takes advantage of machine learning, statistical, and visualization techniques to find and extract knowledge. The data collected are to be analyzed using techniques such as decision trees, neural networks, naive Bayes, support vector machines (SVMs), АГ-means, etc., which help in predicting results such as students’ learning styles, interest in course, learning abilities, knowledge, and interests, predict student retention, and predict course adaptability .
Lykourentzou et al.  have proposed a methodology using machine learning techniques for predicting the dropout rate of students for e-leaming courses. The machine learning techniques used are feedforward neural networks and SYMs. The authors have discussed how the results of machine learning techniques are tested.
In , a prototype web-based support tool has been proposed, which can automatically recognize students with a high dropout probability. Zhu  has discussed a machine learning approach for enhancing education and personnel training. According to Liebowitz , adaptive/personalized learning, educational data mining, data visualization, visual analytics, knowledge management, and blended/e-leaming play growing roles to better inform higher education officials and teachers. Alier and Lobo  have discussed the data mining techniques and machine learning algorithms for recommending courses in e-leaming environment based on past data.
In , Beck and Woolf have developed models using machine learning for predicting student behavior and support decision making. According to Dede , there is a lot of development in distance education with emerging technologies and distributed learning. The author has also focused on pedagogical strategies and designs in the educational system. In , Denrsar et al. have discussed the Orange framework for machine learning and data mining. This framework supports the following: (a) data preprocessing, (b) modeling, (c) evaluation, and (d) data mining classification and clustering algorithms. Mitchell  has discussed machine learning and data mining. According to him, data mining improves future decisions using historical data and discovers irregularities. The author has discussed scientific issues; basic technologies helpfiil in learning analytics are as follows: (a) learning from structured and unstructured data, (b) experimentations, (c) explorations, (d) optimizing decisions, and (e) inventing new features to improve accuracy. These learning approaches are helpful in applications such as healthcare, marketing, manufacturing, financial and intelligent data analysis, etc. Baker and Inventado  have discussed the relationship between the educational data mining and learning analytics, which are the emerging areas. In , Burgos et al. have used data mining techniques for modeling students’ performance. They have proposed the predictive model to prevent the dropout rate in e-leaming courses using knowledge discoveiy techniques. In educational data mining, there is a need to derive and innovate new approaches using statistical techniques, machine learning, psychometrics, and scientific computing to transform the existing system. Naren  has discussed data mining applications for predicting behavioral patterns of the students. In , the educational data mining approach is used for analyzing and predicting students learning behavior and experiences. Educational data mining helps in designing smarter and intelligent learning, which can better inform learners and educators .
In the present scenario, the failure in the dissemination of quality-driven education is the biggest challenge faced by our educational institutions. Tire skill gap in quality parameters involves: (a) practical skilling of students, (b) placement of students, (c) adoption of latest syllabi into curricula, (d) efficient decision-making of career selection, (e) competitive environment of examinations, (f) cognitive and computational approaches, and (g) research and development methodologies in training. There are data mining techniques, but, still, then utilization hr educational sector is unexplored. There is a need to implement data mining techniques in the educational sector to fill up these existing gaps and adoption of learning analytics with innovation in teaching pedagogies.
As many of the educated students involve students undergoing training, completed trainings are not getting employability, which may be due to the following: (a) fewer placements, (b) lack of knowledge, (c) skill gap, (d) lack of industry-relevant curriculum and exposure, (e) engineering concepts applied to industry are not discussed during training, and (f) lack of industry- oriented talks, workshops, and conferences. There is need of counseling to understand the factors which are as follows: (a) kinds of jobs that are available, (b) how to determine which job profiles match students’ interests and skills, and (c) the skill gaps that may disqualify students and how to address those skill gaps.
As of nowadays, machine learning and data mining techniques need to be explored for answers to these questions, as this area of machine learning is least utilized in the educational sector to find the meaningful information from a huge volume of unstructured educational entities.
With the advancement in technology, their increased utilization in the educational field has resulted in a collection of voluminous data. These data need innovative approaches to make them meaningful for students, academia, administrators, and management. The intelligent data analytics in the educational field needs to be explored with new pedagogical teaching practices and learning analytics. There is an emergent need for machine learning and data mining techniques in education because the effective utilization of data mining in the educational field is still lacking.
With the confluence of machine learning and educational data mining fields, effective decision-making can be achieved, and new academic models can be predicted. Apart from it, certain characteristics of students and academia can be predicted such as (a) learning behavior of student, (b) response time of student to answer question, (c) area of interest, (d) understanding level of student, (e) teaching aids, (f) instructor teaching strategy, (g) multimedia techniques to be included in learning or not, (h) participation of students in online activities, and (i) designing syllabi as per industry demand.