🍷 Wine Quality Prediction

Predicting wine quality from physicochemical features using a clean ML pipeline: preprocessing, feature selection, and model comparison.

Role

ML pipeline design and implementation (solo project)

Timeline

2024 Β· Personal / coursework-style project

Tech

Python, scikit-learn, Pandas, Matplotlib, Hugging Face Spaces

Wine quality prediction plots and interface

TL;DR

Problem & Context

Wine quality is traditionally assessed by human experts, but many production and quality control decisions depend on measurable physicochemical properties: acidity, sugar, sulfur dioxide, alcohol content, and more. The question is: how far can we get by predicting human-rated quality scores from those numeric features alone?

This project uses a public wine quality dataset to build a supervised learning model that predicts a discrete quality score. The focus is not just on reaching a single metric, but on building a clean and reusable ML pipeline: from data exploration and preprocessing to feature selection, model training, and simple deployment.

Data & Inputs

Exploratory data analysis (EDA) was used to understand feature distributions, detect potential outliers, and inspect correlations between physicochemical properties and quality labels.

Approach & Pipeline

The project is structured as a clean ML pipeline rather than just a single β€œfit” call. The main stages are:

Results & Evaluation

The chosen model achieves solid performance on the held-out test split, capturing the main patterns between physicochemical properties and perceived quality. While there is still noise due to subjective ratings and overlapping feature distributions, the model is able to distinguish clearly low-quality from clearly high-quality wines.

Beyond the exact numbers, the project demonstrates a complete workflow for building a robust classifier on a real dataset with noisy labels.

Implementation

The deployment decouples the front-end UI from the underlying scikit-learn pipeline, making it easy to swap or retrain models without changing the interface.

Challenges & Lessons Learned

This project reinforced the value of building end-to-end pipelines and thinking about how a model will be used, not just how it scores on a benchmark metric.

Links

Live Demo on Hugging Face Spaces  Β·  GitHub Repository  Β·  Back to all projects