Deep Networks for Image Classification
Summary
Implementation and comparison of state-of-the-art deep convolutional neural networks for image classification. This project explores both the original architectures (VGG16, ResNet18, Inception v1) and improved variants on MNIST and CIFAR-10 datasets.
The comparative analysis demonstrates that the "improved" versions slightly outperform their vanilla counterparts, validating the effectiveness of modern architectural enhancements such as batch normalization, optimized pooling strategies, and better weight initialization.
An extensive six-page research paper accompanies this project, detailing the architectural principles, training procedures, and experimental results across both datasets.
Implementation Details
Important Considerations
- All images resized to 64×64 due to time and hardware constraints
- Network input layers adapted for grayscale images (MNIST) where applicable
- Enhanced training process with label smoothing, weight decay, and multi-step learning rate scheduling
- Dataset-specific data augmentation techniques applied
VGG16 Architecture Improvements
- Added batch normalization layers between each convolution and activation layer
- Replaced Adaptive Average Pooling with 5×5 Adaptive Max Pooling
- Implemented proper weight initialization from appropriate distributions
ResNet18 Architecture Improvements
- Replaced the original 7×7 convolution with a stack of 3×3 convolutions
- Added batch normalization to the convolutional stack between layers and before activations
- Introduced dropout layer with minor ratio before the linear classifier
- Implemented proper weight initialization from appropriate distributions
Inception v1 Architecture Improvements
- Replaced the original 7×7 convolution with a stack of 3×3 convolutions, inspired by VGG findings