Deep Networks for CV - Konstantinos Skoularikis

Summary

Implementation and comparison of state-of-the-art deep convolutional neural networks for image classification. This project explores both the original architectures (VGG16, ResNet18, Inception v1) and improved variants on MNIST and CIFAR-10 datasets.

The comparative analysis demonstrates that the "improved" versions slightly outperform their vanilla counterparts, validating the effectiveness of modern architectural enhancements such as batch normalization, optimized pooling strategies, and better weight initialization.

An extensive six-page research paper accompanies this project, detailing the architectural principles, training procedures, and experimental results across both datasets.

📄 View Research Paper

Implementation Details

Important Considerations

All images resized to 64×64 due to time and hardware constraints
Network input layers adapted for grayscale images (MNIST) where applicable
Enhanced training process with label smoothing, weight decay, and multi-step learning rate scheduling
Dataset-specific data augmentation techniques applied

VGG16 Architecture Improvements

Added batch normalization layers between each convolution and activation layer
Replaced Adaptive Average Pooling with 5×5 Adaptive Max Pooling
Implemented proper weight initialization from appropriate distributions

ResNet18 Architecture Improvements

Replaced the original 7×7 convolution with a stack of 3×3 convolutions
Added batch normalization to the convolutional stack between layers and before activations
Introduced dropout layer with minor ratio before the linear classifier
Implemented proper weight initialization from appropriate distributions

Inception v1 Architecture Improvements

Replaced the original 7×7 convolution with a stack of 3×3 convolutions, inspired by VGG findings

Technology Stack

Python PyTorch Matplotlib Pandas Scikit-Learn

Code & Resources

💻 View on GitHub