arrow_back Back to Projects
Python Machine Learning Scikit-Learn MLflow Pandas Data Science

London Weather Predictor

A machine learning project predicting London's daily temperature using weather data, featuring a full data science workflow from EDA to model tracking with MLflow.

London Weather Predictor screenshot

🌦️ London Weather Predictor

This project involves predicting London’s daily temperature using machine learning. It follows a structured, end-to-end data science workflow encompassing exploratory data analysis (EDA), data cleaning, model training, and experiment tracking.

🚀 Key Features and Workflow

  • Exploratory Data Analysis (EDA): Conducted thorough data exploration in Jupyter Notebooks to visualize temperature trends over time, identify correlations between different weather variables (like sunshine, cloud cover, and precipitation), and verify data integrity by handling missing values.
  • Data Preprocessing: Built a dedicated Python cleaning script (src/data/clean.py) to process raw meteorological data into a finalized, clean dataset ready for model consumption.
  • Machine Learning Models: Trained and evaluated several regression models to predict the mean temperature:
    • Linear Regression
    • Decision Tree Regressor
    • Random Forest Regressor
  • Experiment Tracking: Integrated MLflow to systematically log model parameters, metrics (RMSE, MAE, R²), and artifacts, enabling easy model comparison and versioning.

🔧 Tech Stack

  • Python: Core programming language.
  • Pandas & NumPy: For powerful data manipulation and numerical operations.
  • Scikit-Learn: For building and evaluating machine learning models.
  • Matplotlib & Seaborn: For creating insightful data visualizations (correlation heatmaps, trend charts).
  • MLflow: For robust experiment tracking and model lifecycle management.

📊 Results & Visualization

The models were evaluated based on their ability to accurately forecast temperature. The Random Forest Regressor and other models’ performances were tracked, with visualizations like correlation heatmaps clearly showing the relationships driving temperature fluctuations.

Note: This project is part of a broader study into applying machine learning to meteorological time-series data, laying the foundation for further hyperparameter tuning and potential deployment as an API.