GitHub - Saket046/Red-wine-quality-predictor: A machine learning project to predict red wine quality using the Kaggle dataset. It includes data preprocessing, feature engineering, model training (XGBoost with 95.8% accuracy), and deployment via a Flask app on Render, offering an interactive interface for predictions.

Project Title: Red Wine Quality Prediction and Deployment

Data Collection

Sourced dataset from Kaggle: Red Wine Quality Dataset.
Dataset contains 1,599 rows, 11 input features (e.g., fixed acidity, pH, alcohol) and 1 target variable (quality).

Exploratory Data Analysis (EDA) and Data Preparation

Data Formatting: Verified column data types and statistical descriptions using .describe().
Missing Values: Found no missing data, requiring no imputation or removal.
Duplicates: Identified and removed 240 duplicate rows, retaining only the first occurrence.
Outlier Handling: Used Interquartile Range (IQR) to detect and remove outliers with custom functions.

Feature Engineering

Feature Selection:
- Created a correlation matrix and heatmap to analyze relationships between features.
- Dropped low-impact features ('pH', 'fixed acidity', 'citric acid', 'free sulfur dioxide') based on their correlation with the target variable.
Feature Scaling: Applied MinMaxScaler to standardize data, enabling effective performance for Euclidean distance-based models.
Target Encoding: Binarized wine quality scores (Good: >7, Bad: ≤7).

Data Balancing

Addressed class imbalance (125 "Good" vs. 860 "Bad") using SMOTE (Synthetic Minority Oversampling Technique) to ensure balanced training data.

Model Development and Evaluation

Train-Test Split: Divided data into 80% training and 20% testing sets.
Model Selection: Compared five classifiers using accuracy scores:
- Random Forest (93.8%)
- XGBoost (94.4%)
- K-Nearest Neighbors (94.7%)
- Decision Tree (90.1%)
- SGD Classifier (84.6%)
- Selected XGBoost as the best performer through 10-fold cross-validation with a mean accuracy of 92.8%.
Hyperparameter Tuning: Optimized parameters (learning_rate, max_depth, min_child_weight, gamma) using RandomizedSearchCV and GridSearchCV for peak performance.
Performance Metrics:
- Precision: 0.91
- Recall: 0.95
- F1 Score: 0.93
- Test Accuracy: 94.5%
- Confusion Matrix:
```
[619  66]  
[ 32 659]
```

Model Deployment

Model Pickling: Saved the trained model as a pickle file and verified functionality with dummy scaled inputs.
Flask Web Application:
- Developed a user-friendly webpage with a form for inputting wine parameters.
- Validated inputs using Python's try and except for error handling.
- Scaled user inputs and fed them into the model to predict wine quality (Good/Bad).
- Displayed prediction results with a help table for parameter guidance and contact information.
Deployment on Render:
- Created requirements.txt and Procfile for dependency management and app configuration.
- Uploaded codebase to GitHub and deployed the app on Render, enabling public accessibility.

Impact

This project automates wine quality prediction, providing accurate and efficient insights for quality assurance. It integrates machine learning with a user-friendly interface, showcasing end-to-end machine learning project deployment.

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
static		static
templates		templates
Procfile		Procfile
README.md		README.md
app.py		app.py
red wine.ipynb		red wine.ipynb
requirements.txt		requirements.txt
wine_quality_model.pkl		wine_quality_model.pkl
winequality-red.csv		winequality-red.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Project Title: Red Wine Quality Prediction and Deployment

Data Collection

Exploratory Data Analysis (EDA) and Data Preparation

Feature Engineering

Data Balancing

Model Development and Evaluation

Model Deployment

Impact

About

Languages

Saket046/Red-wine-quality-predictor

Folders and files

Latest commit

History

Repository files navigation

Project Title: Red Wine Quality Prediction and Deployment

Data Collection

Exploratory Data Analysis (EDA) and Data Preparation

Feature Engineering

Data Balancing

Model Development and Evaluation

Model Deployment

Impact

About

Topics

Resources

Stars

Watchers

Forks

Languages