Computer Science Homework Help
University of Huddersfield Data Mining and Data Report
Study the dataset: find its size, number and describe the type of variables. Check if
there’s any data missing (if yes, apply an appropriate cleaning technique). Perform a
descriptive statistical analysis of the dataset: choose a range of the variables of your interest, find their frequencies and dependencies through bar plots, grouped bar plots, pie-charts, etc. Draw conclusions.
Advanced: Perform a factor analysis. Comment on your findings.
(THIS IS THE UCI REPOSITORY)
https://archive.ics.uci.edu/ml/datasets/student%2B…
Q2.
Split the dataset on training and testing parts. Build a Random Forest Regression
model (using random Forest R library) to predict a final year grade (G3). Evaluate your
model using a test dataset.
Plot an importance graph. Estimate accuracy. Comment on your results.
Advanced: Divide the students into 3 categories: poor achieving students, average
achieving, well achieving (based on the final grade). Build a classification Random Forest
model. Evaluate your model using test dataset. Print confusion matrix. Build conclusions.
Recommended reading: Breiman, L., (2001). Random Forests. Machine Learning. 45(1),5–32. doi: 10.1023/A:1010933404324
Recommended reading:Breiman, L., (2001). Random Forests. Machine Learning. 45(1),5–32.doi: 10.1023/A:1010933404324
Structure of the evaluative report:
1.Cover page with your name, name of the chosen dataset and the corresponding Data Mining method.
2.Introduction which contains a short description of the chosen method.
3.Answers on the stated questions and conclusions.
4.A literature review which should include the reference to the original method, its
extensions and improvements (if applicable) and a few recent applications of the method. You must use APA 7th for referencing.
5.Appendix which must include the R commands you used in your analysis
.I WOULD WANT AN EXPORTED R ALSO
All plots, figures and graphs must be numbered and clearly labelled