Part II (25 points): Using one of the available datasets on KNIME or a dataset of your own, create a step-by-step tutorial detailing how to use KNIME to build and evaluate predictive models from either an XLSX file or a CSV file. Use clear, concise directions accompanied by screenshots of the software so that any unfamiliar user of the software can accomplish each task by following your instructions alone. I will follow your directions, exactly as written, and should be able to accomplish the following tasks (each worth 1 point) using KNIME:
Open an XLSX or CSV file located on the computer desktop.
Rename a column in the file.
Exclude observations with missing data for any given column.
Impute values to replace missing data (using the mean) for any given column.
Create a column of calculated values from values in two other columns.
Create a column of a dichotomous variable from a continuous variable.
Find the mean of a variable.
Find the standard deviation of a variable.
Generate a bivariate scatterplot of two continuous variables.
Train and evaluate a linear regression model:
Randomly separate 75% of the data into a training set and 25% into a test set.
Use at least two independent variables to train a regression model to predict a dependent variable.
Apply a trained regression model to a test set.
Generate a report to evaluate the predictive performance of the regression model.
Explain how to find and interpret the R2 of the model.
Explain how to find and interpret the mean absolute error of the model.
Train and evaluate a decision-tree model:
Separate 70% of the data into a training set and 30% into a test set using a stratification method.
Train a decision-tree model to predict a dependent variable.
Apply a trained decision-tree model to a test set.
Explain how to find and interpret the sensitivity and specificity of the model.
Explain how to find and interpret the area under the ROC curve for the model.