Data-mining Research In Education
As an interdisciplinary discipline, data mining (DM) is popular in the education area, especially when examining students’ learning performances. It focuses on analyzing educational-related data to develop models for improving learners’ learning experiences and enhancing institutional effectiveness. Therefore, DM does help education institutions provide high-quality education for their learners. Applying data mining in education also known as educational data mining (EDM), enables to better understand how students learn and identify how to improve educational outcomes. The present paper is designed to justify the capabilities of data mining approaches in the field of education. The latest trends on EDM research are introduced in this review. Several specific algorithms, methods, applications and gaps in the current literature and future insights are discussed here. Keywords: Educational Data Mining (EDM); Data Mining (DM); Algorithm; Clustering; Classification; Regression
One of the biggest challenges that educational institutions facing today is the exponential growth of educational data and how to apply this data to improve the quality of managerial decisions . Education Institutions would like to know, for instance, which students will enroll in particular course programs, and which students will need assistance for graduation. Through the analysis and presentation of data they collected, or data mining, the challenges of these student or learners are able be effectively addressed. Data mining enables organizations to uncover and understand hidden patterns in vast databases by using their current reporting capabilities. And these patterns are then built into data mining models and applied to predict individual behavior and performance with high accuracy. In this way, resources and staff can be allocated by institutions more effectively. Data mining may also, for example, efficiently allocate resources with an accurate estimate of how many students will take action before he or she drops out. Educational data mining (EDM) is an emerging discipline including but not limited to information retrieval, recommender systems, visual data analytics, social network analysis (SNA), cognitive psychology, psychometrics, and so on. Its methods is often different from those methods from the broader data mining literature. What’s more, EDM draws from several reference disciplines including data mining, learning theory, data visualization, machine learning and psychometrics . And this emerging field of EDM examines the unique ways of applying data mining techniques to solve educationally related issues. This paper is to synthesize and share various examples by using data mining in education, to support reflection on teaching and learning. The background of EDM is described, then various algorithms that frequently used are briefly presented. Some specific EDM methods found are described. Subsequently, several examples of applications demonstrate how data mining are used to save resources and help teachers and learners. Finally, we conclude the paper.
BACKGROUND OF DATA MINING
Data mining is an interdisciplinary subfield of computer science [3-5]. Data mining is the analysis step of the ‘knowledge discovery in databases’ process, or KDD . Data mining techniques have their roots in machine learning, artificial intelligence, computer science, and statistics etc. . And data mining is an exploratory process, but it can be used for confirmatory investigations . It is different from other searching and analysis techniques because data mining is highly exploratory, where other analyses are typically problem-driven and confirmatory. Through the combination of an explicit knowledge base, sophisticated analytical skills, and domain knowledge, hidden trends and patterns are able to be uncovered. These trends and patterns form the predictive models that enable to assist organizations with uncovering useful information then guide decision-making . The Cross Industry Standard Process for Data Mining (CRISP-DM) is a cycle process for development and analysis of data mining models . As the demand for data mining increases and more algorithms are created, CRISP-DM ensures practices that everyone can follow, and it gives specific tips and techniques on how to understand business data by deploying a data-mining model. CRISP-DM has six phases including business understanding, data understanding, data preparation, modeling, evaluation, and deployment .
BACKGROUND OF EDUCATIONAL DATA MINING
Educational data mining as a field for solving educationally-related problems, at a high level, it seeks solutions to improve methods for exploring the data, which usually has meaningful hierarchy at multiple levels, in order to discover new insights into how people learn in the context of these settings . For instance, a student’s college transcript may contain a temporally ordered list of courses taken by him or her, the grade that the student earned in each course, and information about when the student selected or changed his or her academic major or minor. We might also understand how different individuals engage with or potentially ‘game’ the EDM system. Taken together, these learning analytics provide much useful information for the design of learning environments. EDM applies Los of techniques such as Decision Trees, Neural Networks, Naïve Bayes, K-Nearest neighbor, etc into its examples. Qualitative techniques such as interviews and document analysis are frequently used to support case studies in EDM. EDM impacts students directly within areas of course content, course selections, recommender systems, and admissions. In addition, applications of specific data mining methods like web mining, classification, and multivariate statistics are key techniques used in educationally related data . These approaches can be applied into modeling students’ individual difference and respond to those differences by providing away, which help improve students’ performance .
ALGORITHMS OF DATA MINING
Data mining relies on disciplines like classification, categorization, estimation, and visualization. Classification assists with identifying associations and clusters, and separates subjects under study. E.g., education institutions can use classification comprehensively to analyze student’s characteristics. Categorization applies rule induction algorithms to handle categorical outcomes. The estimation includes predictive functions or likelihood deals with continuous outcome variables. Estimation and classification use unsupervised or supervised modeling techniques. The visualization uses interactive graphs to demonstrate mathematically induced data and scores and is much more sophisticated than traditional bar charts or pie charts. An algorithm is a specific, mathematically driven data mining function, such as a neural network, classification and regression tree (C&RT), or K-means. Data mining techniques including algorithms such as clustering, classification, regression, neural networks, association rules, decision trees, of them have been applied successfully in the educational area . E.g., methods for hierarchical data mining and longitudinal data modeling have been applied into EDM.
The goal of clustering is to find data points that naturally group together, splitting the full data set into clusters sets. By using clustering techniques we are able to further identify dense and sparse regions in object space, and discover overall distribution pattern and correlations among data attributes. Here are some researches about clustering techniques [15-17, 18, 19].
Classification is used to predict values for some variables. This algorithm frequently employs the decision tree or neural network-based classification algorithms. In classification test, data are used to estimate the accuracy of the classification rules. If the accuracy is acceptable then rules are able to be applied into new data. Some popular classification methods include decision trees, logistic regression (for binary predictions) and support vector machines.
Association rules are used to find relations between different items . Back to 1995, the analysis method of association rule was frequently utilized in most studies on educational data mining because of its less extensive expertise while comparing with other methods [20, 21]. However, after the year of 2005, as researchers frequently adopting clustering and classification methods, the trend changed.
Regression analysis can be applied to model the relationship between independent variables and dependent variables. Independent variables are attributes we already known and response variables are able to predict what we want. But a number of realworld problems are not simply prediction. Therefore, more complex techniques (e.g., decision trees, or neural nets) may be necessary for future prediction. Some popular regression methods within educational data mining include linear regression, neural networks, and support vector machine regression.
Neural networks have the remarkable ability to obtain meaning from complicated or imprecise data and can be used to extract patterns and detect trends that are too complex to be noticed by humans or computer techniques, they are good at identifying patterns or trends for future forecasting needs.
A decision tree is tree-shaped structures, which represents decisions sets. Specific decision tree methods include classification and regression trees and Chi-Square Automatic Interaction Detection
METHODS OF EDUCATIONAL DATA MINING
EDM has a large number of powerful methods [2, 22], some of which are widely acknowledged to be universal data mining types (E.g., clustering, prediction , outlier detecting, relationship mining, etc.). However, Discovery with Models and Distillation of Data for Human Judgment is considered more prominent methods recently in educational data mining .
Clustering approaches had been applied to obtain a clear distinction between the clusters. Once a set of clusters has been determined, new instances can be classified by determining the closest cluster. Clustering is able to be applied to grouping similar course materials or grouping students based on their learning behavior patterns . There was an e-learning report was design to forecast the students’ behavior patterns within the data mining approaches .
Prediction can deduce a single aspect of the data from combinations of data in other aspects. Classification, regression and density estimation are the main types of prediction methods. The prediction has already been applied into predicting students’ performance . Relationship Mining Relationship mining can identify relationships among variables and encode them in rules for later application. Broadly, there are four different types of relationship mining: association rule mining, sequential pattern mining, correlation mining, and causal data mining, some of them has already utilized into identifying the relationship in patterns of students’ behavior, difficulties, or mistakes that learners usually encountered with .
Process mining has been used to extract data from event logs in an information system to form a clear presentation of the overall activities. There are three different subfields of process mining: model discovery, model extension, and conformance checking. Process mining is reported able to reflect students’ behavior in the sequence of course, grade etc. .
Text mining can be used for deriving information with high accuracy from text data and resources. The main contents of the text mining contain text categorization, text Data-Mining Research in Education 6 / 10 clustering, concept/entity extraction sentiment analysis and document summarization etc. Text mining is applied to analyze contents from forums, chats, Webpage or text resources etc. .
Social Network Analysis
Social Network Analysis (SNA) is used to measure the relationships among different entities of information, and it is able to analyze the relationships in various tasks .
Discovery with Models
In discovery with models, a model is validated via prediction, clustering, or manual knowledge engineering. Key applications of this method include discovering relationships among student behaviors, characteristics and contextual variables [23, 31].
Distillation of Data for Human Judgment
Data is distilled to enable humans to identify well-known patterns for identification, it is also distilled for classifying data features. The goal of this method is to summarize and present the information in a useful, interactive and visually appealing way for understanding the large amounts of education data and supporting decision making . Data is distilled for human judgment in educational data mining for two main purposes: identification and classification.
APPLICATIONS IN EDUCATION
There are three applications of educational data mining with having received particular attention are discussed here. Predicting Student Performance Lin applied classification and regression trees to predict what types of students would drop out from school, and then return to school later on . The models were able to provide short-term accuracy for predicting which types of students would benefit from student retention programs . Chacon and Spicer et al. developed a system based on data mining helps the institution identify and respond to students at-risk . Their work is highly representative of the discipline, because it follows with a strict data mining process, which is quantitative. The research by Yeats, Reddy and Wheeler found that students who attend writing centers tend to do a good job in their classes . Yu and DiGangi, et al. discovered that east coast students in USA tend to keep enrolled longer than their west coast counterparts do .
Course Management System
EDM is often used in course management systems, like Moodle, which contains usage data that includes different activities. García, Romero, Ventura, and de Castro developed a simplified data mining toolkit that operates within the course management system and allows students and their learning users to get data mining information for their courses . This research and application contributions will allow non-technical faculty to engage in educational data mining activities. Instead of traditional static course patterns, data mining can be applied to customize learning activities and adapt the pace for learners to complete courses [38, 39]. It will create significant and optimal learning experiences for each student. Also, Blikstein found different types of programming behaviors in an online course .
Planning and Scheduling
Researches on mobile learning environments recently suggest that data mining can be applied to help provide personalized contents to different mobile users, despite the differences between mobile devices and conventional PCs. EDM applications will allow non-technical users engage in data mining tools and activities making processing more accessible for all EDM users . There are some examples, including statistical and visualization tools, analyzing social networks and related influence on learning outcomes 
As the related technology developed, costs and challenges associated with implementing EDM applications, like storing logged data and managing data systems . Moreover, choosing which data to mine and analyze may also be a challenge. In addition, individual privacy is a continued concern for the application of educational data mining tools. With free, accessible tools in the market, students and learners may be at risk providing information to the learning system. Protecting individual privacy should be considered for the long-tern development of EDM. Moreover, it’s unclear what data displays, visualizations, and visual analysis are most informative and support effective decision making for different stakeholders.
CONCLUSION AND FUTURE INSIGHTS
Data mining is a powerful analytical tool to enhance decision-making and analyzing new patterns and relationships for organizations. And EDM contains techniques including data mining, statistics, machine learning. DM needs to analyze data coming from teaching and learning, test learning theories, and policy decision-making, etc. There are a number of opportunities that exist in EDM, from an analysis at the organizational level to the analysis at the individual level. What’s more, EDM is widely used and applied by learners, researchers, and teachers, even institutions.
Recently, there are several studies focus on applying EDM to admissions and enrollment, but we don’t know exactly how institutions using data mining to enhance student learning or improving related educational processes. And results from EDM research are typically obtained from the narrow context of specific educational settings. Therefore, the need for studies to examine in the broader context is necessary. For the overall EDM work to be completed, the urgent need of examining how to widespread the adoption of educational data mining is necessary. Furthermore, research indicates the area of educational data mining is concentrated in western cultures and subsequently, other countries like Asians may not be represented in the related researches and studies. Therefore, applications across multiple contexts should be considered in the development of future models .
- Koedinger K, Cunningham K, Skogsholm A, Leber B. (2008) An open repository and analysis tools for fine-grained, longitudinal learner data. In: First International Conference on Educational Data Mining. Montreal, Canada; 2008, 157-166.
- Baker, R., & Yacef, K.(2009). The State of Educational Data mining in 2009: A Review Future Visions. Journal of Educational Data Mining, 1(1).
- Data Mining Curriculum. ACM SIGKDD. 2006-04-30. Retrieved 2014-01-27.
- Clifton, Christopher (2010). Encyclopedia Britannica: Definition of Data Mining. Retrieved 2010-12-09.
- Hastie, Trevor; Tibshirani, Robert; Friedman, Jerome (2009). The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Retrieved 2012-08-07.
- Fayyad, Usama; Piatetsky-Shapiro, Gregory; Smyth, Padhraic (1996).From Data Mining to Knowledge Discovery in Databases. Retrieved17 December 2008.
- Dunham, M. (2003). Data Mining: Introductory and Advanced Topics. Upper Saddle River, NJ: Pearson Education.
- Berson, A., Smith, S., & Thearling, K. (2011). An Overview of Data Mining Techniques Retrieved November 28, 2011, from http://www.thearling.com/text/dmtechniques/dmtechniques.htm
- Kiron, D., Shockley, R., Kruschwitz, N., Finch, G., & Haydock, M. (2012). Analytics: The Widening Divide. MIT Sloan Management Review, 53(2), 1-22.
- Leventhal, B. (2010). An introduction to data mining and other techniques for advanced analytics. Journal of Direct, Data and Digital Marketing Practice, 12(2), 137-153, doi:10.1057/dump. 2010.35.
- EducationalDataMining.org. (2013). Retrieved 2013-07-15. Calders, T., & Pechenizkiy, M. (2012). Introduction to the special section on educational data mining. SIGKDD Explore. Newsl., 13(2), 3-6. doi: 10.1145/2207243.2207245