arrow_back Back to Projects
Python PyTorch CNN Computer Vision Transfer Learning

Landmark Classification & Tagging

A computer vision application that identifies global landmarks from user-uploaded images using custom CNN architectures and transfer learning with ResNet18.

Landmark Classification & Tagging screenshot

Landmark Classification & Tagging for Social Media

📌 Project Overview

In the age of social media, automatically identifying the location of a photo is a valuable feature for tagging and organization. This project focuses on building a robust landmark classifier capable of identifying global landmarks from raw image data.

The project was implemented in two main phases:

  1. CNN from Scratch: Designing and training a custom Convolutional Neural Network architecture to achieve a baseline accuracy of over 50% on 50 landmark classes.
  2. Transfer Learning: Leveraging a pre-trained ResNet18 model to significantly boost classification performance through fine-tuning.

Technologies Used: Python, PyTorch, TorchScript, Jupyter Notebooks, Matplotlib, Seaborn.


🚀 Key Features & Implementation

1. Custom CNN Architecture

Developed a multi-block CNN backbone inspired by VGG architectures:

  • Convolutional Blocks: Multiple layers with $3 \times 3$ filters and padding to maintain spatial dimensions.
  • Batch Normalization: Integrated to accelerate training and improve stability.
  • Adaptive Average Pooling: Used to produce a fixed-size embedding vector regardless of input image dimensions.
  • MLP Head: A fully connected head with Dropout to prevent overfitting.

2. High-Performance Transfer Learning

Replaced the custom backbone with a pre-trained ResNet18 model:

  • Feature Extraction: Froze the early layers of ResNet to preserve generalized visual features learned from ImageNet.
  • Classifier Refinement: Redesigned the final linear layer to map to the specific 50 landmark categories.
  • Optimization: Achieved a test accuracy of ~73%, a significant improvement over the from-scratch model.

3. Advanced Preprocessing Pipeline

Implemented a robust data augmentation strategy to improve model generalization:

  • Training Augmentation: Utilized RandomResizedCrop, RandomHorizontalFlip, ColorJitter, and RandAugment.
  • Normalization: Normalized images using the mean and standard deviation of the entire dataset to aid gradient descent.

4. Production Readiness with TorchScript

  • Model Export: Exported the trained model using TorchScript for efficient deployment without requiring a Python interpreter.
  • Predictor Wrapper: Engineered a Predictor class that encapsulates the preprocessing, inference, and post-processing (Softmax) logic into a single serialized artifact.

📊 Evaluation

The model’s performance was analyzed using:

  • Confusion Matrix: Visualizing class-wise precision and recall to identify similar-looking landmarks.
  • Top-5 Accuracy: Measuring the model’s ability to include the correct landmark within its top 5 predictions.

💡 What I Learned

  • Architecture Design Trade-offs: Gained experience in balancing model depth and complexity with training time and resource constraints.
  • The Power of Transfer Learning: Empirically verified how pre-trained models can drastically reduce the amount of data and compute needed for specialized tasks.
  • Deployment Best Practices: Learned to use TorchScript to bridge the gap between experimentation in Jupyter and production-ready code.