Regression and Classification Lab 5
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
#@# Regression and Classification Lab 5 Submission Template
#@# Instructions:
#@# 1. Do not modify any line beginning with #@#
#@# 2. Do not begin any line in your solution programs or comments with #@#
#@# 3. Paste your R code below the line PASTE R CODE BELOW HERE for each answer.
#@# 4. Ensure each answer corresponds to the correct question number.
#@# 5. Provide code for all 5 questions, even if incomplete, to avoid parsing errors.
#@# 6. Save this file as a plain .txt file and upload via the Google Form.
#@#=================================
#@# --START OF QUESTION_1--
#@# Title: Water Potability Data Preprocessing with Tidyverse and Recipes
#@# You are a data scientist for a water quality monitoring agency developing a classification system to predict water potability based on chemical and physical properties.
#@#
#@# A) Load the water_potability.csv dataset using tidyverse functions and conduct an initial exploratory analysis.
#@# Use dplyr and ggplot2 to examine the structure, summary statistics, and identify any missing values across all variables.
#@# Calculate the proportion of potable vs non-potable water samples and assess if class imbalance is present.
#@#
#@# B) Create a tidymodels recipe to handle missing values appropriately for machine learning modeling.
#@# Use recipe steps like step_impute_mean() or step_impute_median() as appropriate.
#@# Examine the distribution of each predictor variable using tidyverse visualization and identify any variables with significant skewness.
#@#
#@# C) Expand your recipe to apply appropriate transformations to address skewness in the predictor variables.
#@# Use recipe steps like step_log(), step_sqrt(), or other transformations as needed.
#@# Also consider normalization and scaling steps such as step_normalize(), step_scale(), or step_center() to standardize your variables.
#@# Create before-and-after visualizations using ggplot2 for the transformed variables by applying your recipe.
#@#
#@# D) Add steps to your recipe to address class imbalance if present using appropriate recipe functions.
#@# Compare the original class distribution with your chosen balancing method.
#@# Finalize your preprocessing recipe that will be used for all subsequent modeling questions and provide a summary of all recipe steps applied.
2025-08-09