Description
Week 10: Lab – Using SVM on an Air Quality Dataset
[Name]
[Date]
Instructions
Conduct predictive analytics on the Air Quality dataset to predict changes in ozone
values.You will split the Air Quality dataset into a training set and a test set. Use various techniques, such as Kernal-Based Support Vector Machines (KSVM), Support Vector Machines (SVM), Linear Modelling (LM), and Naive Bayes (NB). Determine which technique is best for the dataset.
Add all of your libraries that you use for this homework here.
# Add your library below.# library(tidyverse)
Step 1: Load the data (0.5 point)
Lets go back and analyze the air quality dataset (we used that dataset previously in the visualization lab). Remember to think about how to deal with the NAs in the data. Replace NAs with the mean value of the column.
# Write your code below.
Step 2: Create train and test data sets (0.5 point)
Using techniques discussed in class (or in the video), create two datasets one for training and one for testing.
# Write your code below.
Step 3: Build a model using KSVM and visualize the results (2 points)
Step 3.1 – Build a model
Using ksvm()
, create a model to try to predict changes in the ozone
values. You can use all the possible attributes, or select the attributes that you think would be the most helpful. Of course, use the training dataset.
# Write your code below.
Step 3.2 – Test the model and find the RMSE
Test the model using the test dataset and find the Root Mean Squared Error (RMSE). Root Mean Squared Error formula here:
* http://statweb.stanford.edu/~susan/courses/s60/split/node60.html
# Write your code below.
Step 3.3 – Plot the results.
Use a scatter plot. Have the x-axis represent Temp
, the y-axis represent Wind
, the point size and color represent the error (as defined by the actual ozone level minus the predicted ozone level). It should look similar to this:
Step 3.3 Graph – Air Quality
# Write your code below.
Step 3.4 – Compute models and plot the results for svm()
and lm()
Use svm()
from in the e1071
package and lm()
from Base R to computer two new predictive models. Generate similar charts for each model.
Step 3.4.1 – Compute model for svm()
# Write your code below.
Step 3.4.2 – Compute model for lm()
# Write your code below.
Step 3.5 – Plot all three model results together
Show the results for the KSVM, SVM, and LM models in one window. Use the grid.arrange()
function to do this. All three models should be scatterplots.
# Write your code below.
Step 4: Create a goodOzone variable (1 point)
This va