Welcome ! Thanks for visiting my blog site. In my previous article I have listed down the way to kick start the Data Science career Journey

In this article I am going to list down the concepts to learn to carry forward your career toward Artificial Intelligence

This Article Contains 5 parts in which we you have to concentrate your learning process

Don’t be hurry ! Take your Sweet time! Lay the strong foundation at your self pace of time!

- 1. Prerequisites – Mathematics and Programming
- 2. History of Artificial Neural Networks
- 3. Important Architectures to be learnt and Back propagation and Optimization
- 4. Deep learning Development lifecycle
- 5. Libraries and Frameworks

## Prerequisites – Mathematics and Programming

I have been working on neural networks from my 7th semester 2018.I had a subject called “Soft computing ” in which I started learning about neural networks and other soft computing algorithms like Fuzzy logics Evolutionary algorithms

My interest towards soft computing has brought me to this Artificial intelligent domain

Lets start with Mathematics and programming required for deep learning

##### Mathematics for Deep learning :

Three Major mathematics topics , we need to learn ( we might have learnt them already in our college curriculam)

You may have a question, Do We need Mathematics for Deep learning ? Yes we do need , If you want to build your own models and diagnose what happened in the pipeline ? .If you want to improve the model performance , under the hood math is important.

- Linear algebra
- Multivariate Calculi
- Probability and statistics

## Linear Algebra:

Linear algebra is a basic to many of the machine learning algorithms. It speed up the normal calculations by using vectorization method . Important concepts to learn in Linear algebra

Topic list

- Scalers , Vectors , Matrix , Tensors — These are the basic entities of linear algebra ,variables can be either of the format. Machine learning / deep learning functions will be represented using function with above variables
- Matrix arithmetic operations — Multiplication , Addition , Subtraction , Division , Transpose , Inverse , Norms
- Different types of Matrices and tensors ( a multidimensional vector is a Tensor)
- Eigen Decomposition – Concept of Eigen vectors , Eigen values are useful in Dimensionality reduction methodologies
- singular value Decomposition
- Determinant , Rank operations

Follow this Resource link to learn basics of linear algebra along with python (numpy)

- https://d2l.ai/chapter_preliminaries/linear-algebra.html
- https://jonathan-hui.medium.com/machine-learning-linear-algebra-a5b1658f0151
- https://builtin.com/data-science/basic-linear-algebra-deep-learning
- https://www.deeplearningbook.org/contents/linear_algebra.html
- https://mml-book.github.io/
- https://www.math.ubc.ca/~pwalls/math-python/linear-algebra/linear-algebra-scipy/

## Multivariate Calculus :

When it comes to algorithms , Role of Calculus is inevitable. Tasks like Gradients Descent ,Numerical Optimization , Back propagation , Mathematical Simulations need Calculus as their back bone to computation

Following Concepts are required in Deep learning

- Differential Calculus – Helps in Back propagation , Numerical Optimization , Solution to equations
- Integral Calculus – Helps in Area calculation , Monte-carlo simulation,etc
- Partial Differentiation — Multi variate equations – solving and optimization
- Automatic Differentiation – – Simplify the Differential computation – all the Deep learning frameworks use auto grad function ( automatic differentiation)
- Gradients – Hessian , Jababian ,Jacobian – Helps in Gradient based optimization
- Back propagation and chain rule – BP is the actual learning process of Deep learning

List of resources to learn Calculus along with Programming them in python

- https://mml-book.github.io/book/mml-book.pdf
- https://d2l.ai/chapter_appendix-mathematics-for-deep-learning/multivariable-calculus.html
- https://www.sas.upenn.edu/~jesusfv/Lecture_NM_1_Numerical_Differentiation_Integration.pdf
- https://ocw.mit.edu/courses/mathematics/18-01sc-single-variable-calculus-fall-2010/download-course-materials/
- https://towardsdatascience.com/automatic-differentiation-explained-b4ba8e60c2ad
- https://towardsdatascience.com/automatic-differentiation-15min-video-tutorial-with-application-in-machine-learning-and-finance-333e18c0ecbb
- https://medium.com/swlh/gradient-based-optimizations-jacobians-jababians-hessians-b7cbe62d662d

## Probability and Statistics

Probability and Statistics is needed for all the tasks in deep learning and machine learning

- To define the type of error to be considered ,
- Classification models involves Probability distribution at the end
- To measure the performance metrics of the models ,
- To define the noise distribution in generative models ,
- Deep Belief networks are based on Probability distributions

I have included extensive list of topics to be learnt in Probability and statistics in my previous blog — https://prabakaranchandran.com/2021/04/22/an-ultimate-data-science-starter-kit-for-beginners/

Few more :

- https://towardsdatascience.com/probability-and-statistics-explained-in-the-context-of-deep-learning-ed1509b2eb3f
- https://d2l.ai/chapter_preliminaries/probability.html
- https://towardsdatascience.com/machine-learning-probability-statistics-f830f8c09326?gi=9dc0045dff84
- https://towardsdatascience.com/probability-theory-for-deep-learning-9551b9255cf0?gi=c0663ceac3ad

##### Programming for Deep learning : – Prerequisites

- Python is a major programming language which is widely used across different applications
- If you are new to python , check out my previous blog . I have listed out resources and topics to be learnt
- Learning pacakages like numpy , scipy , statsmodels , pandas , matplotlib is required for initial learning stage of deep learning
- Practice the Above listed mathematical topics using the python and its frameworks
- If you are coming from Computer science background , you can consider c++ for deep learning ( C++ is faster than Python in computation)

Check these resources to learn python and its frameworks in terms of deep learning

- https://towardsdatascience.com/all-the-numpy-you-need-to-supercharge-your-deep-learning-code-e7a22fe4ede2
- https://scipy-lectures.org/packages/sympy.html
- https://docs.scipy.org/doc/scipy/reference/tutorial/integrate.html
- https://towardsdatascience.com/a-simple-method-for-numerical-integration-in-python-7906c1703af8
- https://www.kaggle.com/borisettinger/gentle-introduction-to-automatic-differentiation

## History of Artificial Neural Networks – 1940 – 2000 – Important Architectures

“Deep Learning waves have lapped at the shores of computational linguistics for several years now, but 2015 seems like the year when the full force of the tsunami hit the major Natural Language Processing (NLP) conferences.” -Dr. Christopher D. Manning, Dec 2015 I have listed down the important historical events happened in the AI domain

1943 – Walter Pitts (logician) and Warren McCulloch (neuroscientist) created a computer model based on the neural networks of the human brain. They used a “threshold logic” to resemble the Biological neuron’s though process. This Model did not have any learning process , it is just forward pass and threshold trigger

Learn More about McCulloch Pitts Net : https://towardsdatascience.com/mcculloch-pitts-model-5fdf65ac5dd1

https://towardsdatascience.com/implement-your-first-artificial-neuron-from-scratch-dc01b9505c18

1949 – Donald Hebb published the paper – “The Organization of Behavior,” which explained a new law for synaptic neuron learning ( Hebbian learning , Hebb Net ) is one of the most straight-forward and simple learning rules for artificial neural networks.

Learn More about Hebbian learning and HebbNet:

1958 – Perceptron was created Cornell University by Frank Rosenblatt et al. The perceptron was built to use for character/pattern recognition. Perceptron is applicable for linear separable system( like OR , AND Gates) . In 1960, Rosenblatt published the book principles of neurodynamics about his research and ideas about modeling the brain.

Learn and practice Perceptron :

1959 – Bernard Widrow and Marcian Hoff of Stanford developed “ADALINE” ( Adaptive linear elements) and “MADALINE.”( Multiple Adaptive linear element) ADALINE was developed to recognize binary patterns , it was able to predict the next bit of steaming phone lines. MADALINE was the first neural network applied to a real world problem, using an adaptive filter that eliminates echoes on phone lines.

Learn and practice more on ADALINE and MADALINE :

1975 – The backpropagation algorithm, initially found by Werbos in 1974, was further explained in 1986 with the book “Learning Internal Representation by Error Propagation by Rumelhart” authored by Hinton, and Williams. Backpropagation is a kind of gradient descent algorithm used with artificial neural networks for convergence and curve-fitting. Back Propagation is used to train multi-layer networks, overcoming the limitations of the single-layer network Perceptron.

For more info :

1980 – The first “convolutional network” was the Neocognitron , by Japanese scientist Fukushima (1980) is used to hand-written character recognition.The neocognitron was inspired by the works of Hubel and Wiesel about the visual cortex of animals. At that time, the back-propagation algorithm was still not used to train neural networks. The neocognitron has given all fundamental ideas behind convNets.

1986 – Recurrent neural networks were based on David Rumelhart’s work in 1986 Hopfield networks – a special kind of RNN – were discovered by John Hopfield in 1982. In 1993, a neural history compressor system solved a “Very Deep Learning” task that required more than 1000 subsequent layers in an RNN unfolded in time. Since RNNs are Loopy networks , They are widely used for time series and text , audio data related problems

Learn More about RNNs:

- https://github.com/karpathy/char-rnn
- http://www.cs.toronto.edu/~guerzhoy/tmp/understand-rnn/handout/index.html
- http://karpathy.github.io/2015/05/21/rnn-effectiveness/
- https://towardsdatascience.com/illustrated-guide-to-recurrent-neural-networks-79e5eb8049c9?gi=fc61a82d49bd
- https://towardsdatascience.com/recurrent-neural-networks-rnn-explained-the-eli5-way-3956887e8b75

1995-1997: LSTM was proposed by Sepp Hochreiter and Jürgen Schmidhuber. By introducing Constant Error Carousel (CEC) units, LSTM deals with the vanishing gradient problem. The initial version of LSTM block included cells, input and output gates. In 2014 Andrej Karpathy’s article on RNNs and LSTMs brought them back to the extensive usage again.

Learn :

1998 – Modern Convolution Networks with Gradient back propagation learning ,inspired by the neocognitron( by fukushima). Yann LeCun et al., in their paper “Gradient-Based Learning Applied to Document Recognition” (cited 17,588 times) demonstrated that a CNN model used for handwritten character recognition.

Learn more about Modern CNN ( pioneer in Computer vision)

In the above Time line , I have given Resource links to important concepts. We will learn about modern deep learning architectures which are popular in Computer vision , Natural language processing and Reinforcement learning separately in the upcoming blogs

Since this is a introductory blog on deep learning , I don’t want to put all the complex architecture in one place.

## Optimization in Deep learning

Optimizer is the function or method to change the weights and biases in order to make the model to converge much faster . Optimizers are the important building block of back propagation. Without optimizer , Back propagation is just gonna be a error and trail method. Optimizers pave the correct path to convergence. The very basic optimizer is Gradient descent which is based on very simple slope calculation along with learning rate.

List of available optimizers for neural networks . Note : they are not for hyper parameter tuning , optimizers are predominantly used in model training ( at every back ward passes)

- Gradient Descent (GD)
- Stochastic Gradient Descent
- Mini-Batch Gradient Descent
- Momentum Based Gradient Descent
- Nesterov Accelerated Gradient (NAG)
- Adagrad
- RMSProp
- Adam
- AdaDelta

Resources list :

- https://towardsdatascience.com/optimizers-for-training-neural-network-59450d71caf6
- https://www.kdnuggets.com/2020/12/optimization-algorithms-neural-networks.html
- https://medium.datadriveninvestor.com/overview-of-different-optimizers-for-neural-networks-e0ed119440c3
- https://www.upgrad.com/blog/types-of-optimizers-in-deep-learning/
- https://keras.io/api/optimizers/
- https://www.kaggle.com/getting-started/164479
- https://ruder.io/optimizing-gradient-descent/

## Deep learning Development life cycle

So Far ,we have learnt all the necessary concepts in neural networks , architectures and allied mathematics

If we need to build a neural network model for a simple regression work , we should maintain several steps to achieve that. The final model has to be available to the end user , it should not sleep in our development environment. That is called as Deep learning development life cycle

- Data collection and Data preparation
- Deep learning Model development
- Training and validation of Deep learning model – Transfer learning if opted
- Testing of Deep learning model
- Compare performance metrics of multiple model
- Deployment of model
- Model monitoring

Read more about MLDC/DLDC

- https://www.educba.com/machine-learning-life-cycle/
- https://www.datarobot.com/wiki/machine-learning-life-cycle/
- https://towardsdatascience.com/transfer-learning-with-convolutional-neural-networks-in-pytorch-dd09190245ce?gi=ede0164dc869
- https://subscription.packtpub.com/book/big_data_and_business_intelligence/9781838550356/1/ch01lvl1sec13/ml-development-life-cycle
- https://towardsdatascience.com/rethinking-ai-machine-learning-model-management-8afeaa31d8f8
- https://towardsdatascience.com/the-machine-learning-lifecycle-in-2021-473717c633bc

## Deep learning Frameworks

There are several frameworks available for Deep learning tasks , List of popular DL frameworks which are available in python

Among them Tensorflow and Pytorch are used by most of the deep learning engineer. I don’t want to put a comparison between Tensorflow and pytorch ( that might offend one of them 🙂 )

Keras is the high level api for tensorflow , Pytorch Ligtning is the high level api wrapper for pytorch.

Keras and Lightnings are very easy to learn, so beginners / non cs grads can easily learn them and move forward.

## Books and YouTube channels to follow

- Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems
- Deep Learning Yoshua Bengio
- Pattern Recognition and Machine Learning
- Grokking Deep Learning
- Deep Learning Illustrated: A Visual, Interactive Guide to Artificial Intelligence
- https://www.youtube.com/watch?v=6M5VXKLf4D4
- https://www.youtube.com/watch?v=9jA0KjS7V_c&list=PLZoTAELRMXVPGU70ZGsckrMdr0FteeRUi
- https://www.youtube.com/watch?v=aircAruvnKk
- https://www.youtube.com/watch?v=5tvmMX8r_OM&list=PLtBw6njQRU-rwp5__7C0oIVt26ZgjG9NI
- https://www.youtube.com/watch?v=zfiSAzpy9NM

Here by I conclude this article , So far I have plotted the road map for fundamental deep learning concepts

In the upcoming blogs , I will discuss more about Road map for Computer vision , Generative models ,Natural Language processing, Reinforcement learning

## I hope , This blog helps you! Happy Learning!

Signing off Prabakaran Chandran

Good work. All the best

LikeLiked by 1 person

Thanks much sir

LikeLike