Hello Everyone! Welcome to my blog. In previous two blog I have listed down the resources and road map to Data Science and Deep learning .
In this blog , I am going to discuss the road map to Computer vision 2021 – Image Classification which includes basic to advanced algorithms used in Image Classification tasks , Model development life cycle ( Training , testing , deployment) and few other tools , frameworks
I have written this blog beginner friendly with enough illustrated resources and mathematics resources for algorithms with hands on tutorials. Feel Free to Post your comments and Queries,
What is Computer Vision?
It is a Field of Machine learning , that focuses on enabling the machines to replicate the human eyes’ functionality. Computer vision involves in applications like Image classification , localisation , segmentation and generation. This can be achieved by Neural network algorithms which have unique architectures to understand the features and patterns of the images.
Let’s discuss the road-map in 4 different parts.
- Non Neural-net – Machine learning based Computer vision tasks
- Deep learning based Computer vision -Evolution of Convolutional neural networks (CNN)
- Image-net Large Scale Visual Recognition Challenge (ILSVRC) Architectures.
- Tools and Frameworks
Image Classification using Non Neural Network – Machine learning algorithms :
Everyone used to start learning computer vision straight away from Deep learning , even with out introduction to Multi layer perceptron. But I would suggest to start practicing from basic machine learning algorithms like K-Nearest Neighbor , Support vector Machine , Random Forest , XgBoost ,etc..
By doing in this way , it would be a revision session where you can relearn the basic machine learning algorithms again and apply them on image classification task.
- K Nearest Neighbor :
- https://towardsdatascience.com/machine-learning-basics-with-the-k-nearest-neighbors-algorithm-6a6e71d01761
- https://www.javatpoint.com/k-nearest-neighbor-algorithm-for-machine-learning
- https://www.pyimagesearch.com/2016/08/08/k-nn-classifier-for-image-classification/
- https://yearsofnolight.medium.com/intro-to-image-classification-with-knn-987bc112f0c2
- https://medium.com/swlh/image-classification-with-k-nearest-neighbours-51b3a289280
- Support Vector Machine – SVM was the most used ML algorithm for Image classification task before CNN
- https://www.analyticsvidhya.com/blog/2017/09/understaing-support-vector-machine-example-code/
- https://www.kdnuggets.com/2018/12/solve-image-classification-problem-quickly-easily.html/2
- https://towardsdatascience.com/svm-support-vector-machine-for-classification-710a009f6873
- https://www.kaggle.com/halien/simple-image-classifer-with-svm
- https://www.kaggle.com/ashutoshvarma/image-classification-using-svm-92-accuracy
- Random forest and Decision Tree
- https://www.robots.ox.ac.uk/~vgg/publications/papers/bosch07a.pdf
- https://towardsdatascience.com/understanding-random-forest-58381e0602d2
- https://towardsdatascience.com/a-beginners-guide-to-decision-tree-classification-6d3209353ea
- https://www.linkedin.com/pulse/decision-tree-satellite-image-classification-jo%C3%A3o-otavio/
- https://github.com/87surendra/Random-Forest-Image-Classification-using-Python
- https://github.com/PraveenDubba/Image-Classification-using-Random-Forest/blob/master/Random_Forest_latest.py
- XGboost :
Computer vision using Deep learning Evolution of Convolutional Neural Networks.
Check the previous blog : https://prabakaranchandran.com/2021/04/26/a-complete-road-map-to-deep-learning-2021-part-1/
1980 – The first “convolutional network” was the Neocognitron , by Japanese scientist Fukushima (1980) is used to hand-written character recognition.The neocognitron was inspired by the works of Hubel and Wiesel about the visual cortex of animals. At that time, the back-propagation algorithm was still not used to train neural networks. The neocognitron has given all fundamental ideas behind convNets.
1998 – Modern Convolution Networks with Gradient back propagation learning ,inspired by the neocognitron( by fukushima). Yann LeCun et al., in their paper “Gradient-Based Learning Applied to Document Recognition” (cited 17,588 times) demonstrated that a CNN model used for handwritten character recognition.
Major Parts of Convolution Neural Network ( LeCun’s Base Architecture) working principle and terminologies
- Convolutional Layers :
- https://towardsdatascience.com/gentle-dive-into-math-behind-convolutional-neural-networks-79a07dd44cf9
- https://www.analyticsvidhya.com/blog/2020/02/mathematics-behind-convolutional-neural-network/
- https://hackernoon.com/the-full-story-behind-convolutional-neural-networks-and-the-math-behind-it-2j4fk3zu2
- https://www.programmersought.com/article/87541005859/
- https://poloclub.github.io/cnn-explainer/
- Pooling Layers :
- https://dev.to/sandeepbalachandran/machine-learning-max-average-pooling-1366
- https://medium.com/@bdhuma/which-pooling-method-is-better-maxpooling-vs-minpooling-vs-average-pooling-95fb03f45a9
- https://www.machinecurve.com/index.php/2020/01/30/what-are-max-pooling-average-pooling-global-max-pooling-and-global-average-pooling/
- Activation functions :
- https://machinelearningmastery.com/choose-an-activation-function-for-deep-learning/
- https://www.analyticsvidhya.com/blog/2020/01/fundamentals-deep-learning-activation-functions-when-to-use-them/
- https://conferences.computer.org/ictapub/pdfs/ITCA2020-6EIiKprXTS23UiQ2usLpR0/114100a429/114100a429.pdf
- https://towardsdatascience.com/comparison-of-activation-functions-for-deep-neural-networks-706ac4284c8a
- Fully connected layer :
- Normalization Layer :
- https://cs231n.github.io/convolutional-networks/#norm
- https://analyticsindiamag.com/everything-you-should-know-about-dropouts-and-batchnormalization-in-cnn/
- https://www.baeldung.com/cs/batch-normalization-cnn
- https://machinelearningmastery.com/batch-normalization-for-training-of-deep-neural-networks/
- https://medium.com/techspace-usict/normalization-techniques-in-deep-neural-networks-9121bf100d8
- Dropout :
- Multi class and Multi label classification:
- Sigmoid and Softmax output layers:
- https://towardsdatascience.com/multi-layer-neural-networks-with-sigmoid-function-deep-learning-for-rookies-2-bf464f09eb7f
- https://glassboxmedicine.com/2019/05/26/classification-sigmoid-vs-softmax/
- https://medium.com/arteos-ai/the-differences-between-sigmoid-and-softmax-activation-function-12adee8cf322
- Weight Initialization in CNN :
- https://machinelearningmastery.com/weight-initialization-for-deep-learning-neural-networks/
- https://medium.com/@tylernisonoff/weight-initialization-for-cnns-a-deep-dive-into-he-initialization-50b03f37f53d
- https://towardsdatascience.com/weight-initialization-in-neural-networks-a-journey-from-the-basics-to-kaiming-954fb9b47c79
- Loss functions for Image classification:
- https://machinelearningmastery.com/how-to-choose-loss-functions-when-training-deep-learning-neural-networks/
- https://towardsdatascience.com/choosing-and-customizing-loss-functions-for-image-processing-a0e4bf665b0a
- https://towardsdatascience.com/understanding-different-loss-functions-for-neural-networks-dd1ed0274718
- https://medium.com/@zeeshanmulla/cost-activation-loss-function-neural-network-deep-learning-what-are-these-91167825a4de
- https://algorithmia.com/blog/introduction-to-loss-functions
- Back Propagation in CNN :
- https://towardsdatascience.com/backpropagation-in-a-convolutional-layer-24c8d64d8509?gi=35b754b311dd
- https://towardsdatascience.com/backpropagation-in-a-convolutional-layer-24c8d64d8509
- https://medium.com/@2017csm1006/forward-and-backpropagation-in-convolutional-neural-network-4dfa96d7b37e
- https://becominghuman.ai/back-propagation-in-convolutional-neural-networks-intuition-and-code-714ef1c38199
- https://www.jefkine.com/general/2016/09/05/backpropagation-in-convolutional-neural-networks/
- Optimizers
- https://www.upgrad.com/blog/types-of-optimizers-in-deep-learning/
- https://towardsdatascience.com/optimizers-for-training-neural-network-59450d71caf6
- https://heartbeat.fritz.ai/exploring-optimizers-in-machine-learning-7f18d94cd65b
- https://medium.datadriveninvestor.com/overview-of-different-optimizers-for-neural-networks-e0ed119440c3
At this stage , you will be able to understand all the concepts around CNN , Let’s move on to different architectures based on ILSVRC competition.
Image-net Large Scale Visual Recognition Challenge (ILSVRC) Architectures.
Image Net Dataset and ISLVRC competion :
ImageNet is an image database organized according to the WordNet hierarchy (currently only the nouns), in which each node of the hierarchy is depicted by hundreds and thousands of images. The project has been instrumental in advancing computer vision and deep learning research. The data is available for free to researchers for non-commercial use. https://paperswithcode.com/sota/image-classification-on-imagenet
Competition
The ImageNet Large Scale Visual Recognition Challenge (ILSVRC) evaluates algorithms for object detection and image classification at large scale. One high level motivation is to allow researchers to compare progress in detection across a wider variety of objects — taking advantage of the quite expensive labeling effort. Another motivation is to measure the progress of computer vision for large scale image indexing for retrieval and annotation.https://paperswithcode.com/sota/image-classification-on-imagenet
Famous Benchmark Architectures:
After LeCun ‘s Modern CNN paper , it took several years to publish a SOTA CNN paper. Alex net was the first big mile stone in image recognition challenge in 2013. Check the Resource below , where I have attached Model architecture , theory , implementation
- AlexNet – 2013 :
- Paper : https://proceedings.neurips.cc/paper/2012/file/c399862d3b9d6b76c8436e924a68c45b-Paper.pdf
- Explanation : https://d2l.ai/chapter_convolutional-modern/alexnet.html , https://d2l.ai/chapter_convolutional-modern/alexnet.html
- Code implementation : https://pytorch.org/hub/pytorch_vision_alexnet/ https://towardsdatascience.com/implementing-alexnet-cnn-architecture-using-tensorflow-2-0-and-keras-2113e090ad98?gi=c45baf963fbc

- 2. VGG 16 and VGG 19 – 2014
- 1. Paper : https://arxiv.org/pdf/1409.1556v6.pdf
- 2. Explanation : https://medium.com/coinmonks/paper-review-of-vggnet-1st-runner-up-of-ilsvlc-2014-image-classification-d02355543a11 https://medium.com/analytics-vidhya/vggnet-architecture-explained-e5c7318aa5b6
- 3. Code Implementation :https://keras.io/api/applications/vgg/ , https://pytorch.org/hub/pytorch_vision_vgg/ https://github.com/Lornatang/VGGNet-PyTorch

- 3. ResNet 18 , 34 , 50 ,101 – 2015 ( Many other version are there – Number of layer differs)
- 1. Paper : https://arxiv.org/pdf/1512.03385.pdf
- 2. Explanation : https://towardsdatascience.com/an-overview-of-resnet-and-its-variants-5281e2f56035 https://shuzhanfan.github.io/2018/11/ResNet/ https://sheng-fang.github.io/2020-05-20-review-resnet-family/
- 3. Code Implementation : https://pytorch.org/hub/pytorch_vision_resnet/ https://www.tensorflow.org/api_docs/python/tf/keras/applications/resnet https://towardsdatascience.com/understand-and-implement-resnet-50-with-tensorflow-2-0-1190b9b52691
- 4. Inception and its variants – 2015
- 1. Paper : https://arxiv.org/pdf/1512.00567.pdf
- 2. Explanation: https://www.analyticsvidhya.com/blog/2018/10/understanding-inception-network-from-scratch/
- https://medium.com/ml-cheat-sheet/deep-dive-into-the-google-inception-network-architecture-960f65272314
- https://towardsdatascience.com/a-simple-guide-to-the-versions-of-the-inception-network-7fc52b863202
- 3. Code Implementation: https://rwightman.github.io/pytorch-image-models/models/inception-v4/
- https://github.com/Cadene/pretrained-models.pytorch/blob/master/pretrainedmodels/models/inceptionv4.py
- https://www.tensorflow.org/api_docs/python/tf/keras/applications/InceptionV3

- 5. MobileNet and its variants -2018
- 6.Efficient Net and Its versions – 2019
- 1. Paper : http://proceedings.mlr.press/v97/tan19a/tan19a.pdf
- 2. Explanation: https://medium.com/@nainaakash012/efficientnet-rethinking-model-scaling-for-convolutional-neural-networks-92941c5bfb95 https://towardsdatascience.com/efficientnet-scaling-of-convolutional-neural-networks-done-right-3fde32aef8ff
- 3. Code Explanation : https://github.com/lukemelas/EfficientNet-PyTorch

Even We have 100s of models in place , I just wanted to list down few of them which are fundamental and important to other models. I did not list the models like ViT , MLP mixer , since the knowledge of Attention and Transformers is required ( I ll give a road map to transformer architecture soon )
Tools , Frameworks and other Resources – Image classification :
Transfer Learning : https://machinelearningmastery.com/how-to-use-transfer-learning-when-developing-convolutional-neural-network-models/

Pytorch – Torch vision models : https://pytorch.org/vision/stable/models.html https://pytorch.org/tutorials/beginner/finetuning_torchvision_models_tutorial.html
Pytorch image models by rwightman (timm) : https://paperswithcode.com/lib/timm https://rwightman.github.io/pytorch-image-models/
PyTorch Image Models (TIMM) is a library for state-of-the-art image classification. With this library you can:
- Choose from 300+ pre-trained state-of-the-art image classification models.
- Train models afresh on research datasets such as ImageNet using provided scripts.
- Finetune pre-trained models on your own datasets, including the latest cutting edge models.
Keras and TensorFlow V2.x
- https://keras.io/examples/vision/
- https://developers.google.com/codelabs/tensorflow-2-computervision#0
- https://medium.com/@rishit.dagli/computer-vision-with-tensorflow-part-2-57e95cd0551
- https://www.tensorflow.org/tutorials/images/transfer_learning_with_hub
PyTorch Lighting :
- https://wandb.ai/wandb/wandb-lightning/reports/Image-Classification-using-PyTorch-Lightning–VmlldzoyODk1NzY
- https://www.kaggle.com/xooca1/image-classification-pytorch-lightning
- https://medium.com/pytorch/introducing-lightning-flash-the-fastest-way-to-get-started-with-deep-learning-202f196b3b98
Another important resource : joshstarmer’s StatQuest Youtube Channel. — Easier and illustrative explanations.
https://www.youtube.com/watch?v=HGwBXDKFk9I
So far , we have covered Image classification in Computer vision , In the upcoming blogs , we can learn object detection , segmentation , generation and other domains
Take your time , Don’t rush , Learn – Practice – Repeat!
Let me know if you have any queries or comments!
Thanks for reading and supporting I hope this blog helps you!
Happy learning!
By
Prabakaran chandran – May 14 0 4.10 am