View on GitHub

recipes_analysis

recipes_analysis

Introduction

This project examines a recipe dataset containing nutritional and rating information. The central question is: What is the relationship between calories and average ratings?

Data Cleaning

Here’s a snapshot of the cleaned DataFrame:

  name id minutes contributor_id submitted
brownies in the world best ever 333281 40 985201 2008-10-27  
in canada chocolate chip cookies 453467 45 1848091 2011-04-11  
412 broccoli casserole 306168 40 50969 2008-05-30  

The relevant cleaned data:

calories minutes average_rating
138.4 40 4
595.1 45 5
194.8 40 5
878.3 120 5
267 90 5

Pivot Table:

calorie_bin mean count
(0, 500] 4.62621 61076
(500, 1000] 4.62457 15458
(1000, 1500] 4.61285 2416
(1500, 2000] 4.61402 826
(2000, 2500] 4.64426 434

Framing a Prediction Problem

Baseline Model

Final Model

Feature Engineering

These features were chosen to better represent the relationships within the dataset, as both calories_per_minute and log_calories align with how calorie usage scales with time and energy density.

Model Selection

GridSearchCV was used to tune hyperparameters by performing an exhaustive search over a parameter grid.

Performance Comparison

Improvement