Article Writing Homework Help
Guidelines · Share screen shot on your response · Share the code and the plots · Put your name and id number · Clear mark question number · Upload Word document · Insert Cover page Questions Attempt
Guidelines
· Share screen shot on your response
· Share the code and the plots
· Put your name and id number
· Clear mark question number
· Upload Word document
· Insert Cover page Questions Attempted
HW04 Cover Sheet
Identify all questions that you attempted in this template
Q1 Chapter 04 Classification Examples
Part 1 Review logistic regression in Chapter 4 – Classification
https://github.com/JWarmenhoven/ISLR-python
Use the examples to review 4.3 logistic regression for the ISLR Text
a. Plot Figure 4.1
b. Plot Figure 4.2
c. Table 4.1, 4.2, 4.3
d. Plot Figure 4.3
Hint use – https://nbviewer.jupyter.org/github/JWarmenhoven/ISL-python/blob/master/Notebooks/Chapter%204.ipynb#4.3-Logistic-Regression
Part 2 Application to Caravan Insurance Data¶
Use Caravan.csv to apply KNN and Logistic Regression to the Caravan data
Hint – use https://nbviewer.jupyter.org/github/JWarmenhoven/ISL-python/blob/master/Notebooks/Chapter%204.ipynb#4.6.5-K-Nearest-Neighbors
Q2. Classification Textbook Examples
Using the Boston data set, fit classification models in order to predict whether a given suburb has a crime rate above or below the median. Explore logistic regression, and KNN models using various subsets of the predictors. Describe your findings.
Hint – use: https://botlnec.github.io/islp/sols/chapter4/exercise13/
Q3 Iris Data Set and Classification (iris.csv)
The Iris dataset was used in R.A. Fisher’s classic 1936 paper. It includes three iris species with 50 samples each as well as some properties about each flower. One flower species is linearly separable from the other two, but the other two are not linearly separable from each other. The columns in this dataset are:
· Id
· Sepal Length Cm
· Sepal Width Cm
· Petal Length Cm
· Petal Width Cm
· Species
a. Plot the iris dataset – i) “Sepal Length vs Sepal Width” ii) “Petal Length vs Petal Width”
Split into Training / Test and
b. Apply Naïve Bayes Classifier to classify species with the decision boundaries
c. Apply logistic regression to classify species with the decision boundaries
d. Apply KNN algorithm to classify species with the decision boundaries
e. Compare the “Truth matrix” and Accuracy of the three algorithms
TP
TN
FP
FN
Accuracy
Naïve Bayes
Logistic Regression
KNN
Hint
Naïve Bayes – https://xavierbourretsicotte.github.io/Naive_Bayes_Classifier.html
Logistic Regression –
https://scikit-learn.org/stable/auto_examples/linear_model/plot_iris_logistic.html
https://www.datacamp.com/community/tutorials/understanding-logistic-regression-python
KNN Algorithm –
https://www.ritchieng.com/machine-learning-k-nearest-neighbors-knn/
Q4 Titanic Data Set and Classification (titanic.zip – already separated as test, train)
a. Perform Exploratory Data Analysis
b. Do Feature Engineering
c. Apply logistic regression
d. Apply KNN algorithm
Hint
Q5. How does k-fold cross validation and grid search on the Social Ads Network data
Use the references the explain how the two work together to evaluate a model
https://scikit-learn.org/stable/auto_examples/model_selection/plot_grid_search_digits.html
https://sebastianraschka.com/faq/docs/evaluate-a-model.html