Computer Science Homework Help

University of Huddersfield Data Mining and Data Report

Get Your Custom Essay Written From Scratch
We have worked on a similar problem. If you need help click order now button and submit your assignment instructions.
Just from $13/Page
Order Now

Study the dataset: find its size, number and describe the type of variables. Check if

there’s any data missing (if yes, apply an appropriate cleaning technique). Perform a

descriptive statistical analysis of the dataset: choose a range of the variables of your interest, find their frequencies and dependencies through bar plots, grouped bar plots, pie-charts, etc. Draw conclusions.

Advanced: Perform a factor analysis. Comment on your findings.

(THIS IS THE UCI REPOSITORY)

https://archive.ics.uci.edu/ml/datasets/student%2B…

Q2.

Split the dataset on training and testing parts. Build a Random Forest Regression

model (using random Forest R library) to predict a final year grade (G3). Evaluate your

model using a test dataset.

Plot an importance graph. Estimate accuracy. Comment on your results.

Advanced: Divide the students into 3 categories: poor achieving students, average

achieving, well achieving (based on the final grade). Build a classification Random Forest

model. Evaluate your model using test dataset. Print confusion matrix. Build conclusions.

Recommended reading: Breiman, L., (2001). Random Forests. Machine Learning. 45(1),5–32. doi: 10.1023/A:1010933404324

Recommended reading:Breiman, L., (2001). Random Forests. Machine Learning. 45(1),5–32.doi: 10.1023/A:1010933404324

Structure of the evaluative report:

1.Cover page with your name, name of the chosen dataset and the corresponding Data Mining method.

2.Introduction which contains a short description of the chosen method.

3.Answers on the stated questions and conclusions.

4.A literature review which should include the reference to the original method, its

extensions and improvements (if applicable) and a few recent applications of the method. You must use APA 7th for referencing.

5.Appendix which must include the R commands you used in your analysis

.I WOULD WANT AN EXPORTED R ALSO

All plots, figures and graphs must be numbered and clearly labelled