Deep Learning notation

2021-10-15 |

[1] Leia este post em português

A computer would deserve to be called intelligent if it could deceive a human into believing that it was human. - Alan Turing

Deep maths, ufff…

By the mid 2021 I started diving into a machine learning course I though I should do. A long time ago, when I graduated, my graduation paper was about chatbots with emotions and how humans would react to that. I wanted to better understand how the techniques had evolved from back then, in 2006, and found something a bit different from what I was expecting.

For the current status quo, you just cannot avoid some basic knowledge of python libraries (such as numpy), linear algebra, and a good dose of mathematical notation understanding, when reading descriptions of machine learning methods. And it can be very frustrating at times.

One bit of notation in an equation you don't grasp completely might prevent you from implementing the concept your are trying to learn. Coming as an experienced developer, I had that beginner-like feeling, while facing modern machine learning basics.

So here I have collected some mathematical notation that I have come across while doing the deep learning course, and also some notes on concepts that felt like mysterious to me like cost and derivatives.

I noted those mostly for my personal use, but posted it as I wish I had found this when searching the Internet. Also I must say notation varies a lot from author to author, and also, that I am still learning, so take my notes with a grain of salt.


The activation of a node in a neural network is something of the form:
output = activation_function(dot_product(weights, inputs) + bias)

General Notation

as per Andrew Ng of the specialization on Coursera [2]


These parameters actually control how parameters w and b work:



The loss function is determined as the difference between the actual output and the predicted output from the model, like y V.S. y^.

Although sometimes loss is also referred as cost, it's not the same thing. The cost function is an average loss over the complete train dataset like Y.

Derivatives (dx)

Collected from a note I found useful on forum posted by BurntCalcium (nick), another student:

Basically if f is a function of x, you're taking a ratio of the *change in f* to the *change in x*, given that the latter is an infinitesimally small quantity. The 'd' that is used while writing the notation represents the Greek letter Δ (Delta), which is commonly used to show change in a quantity in physics and math. So basically dx would mean the change in x, df(x) would mean the change in f(x), and df(x)/dx as a whole is called the derivative of f(x) with respect to x. And of course, in the course the instructors have adopted the notation that dx represents df(x)/dx, however outside the context of this course dx would simply mean change in x.


[2] Deep Learning on Coursera

See also

[3] Capsule Archives
[4] Capsule Home

Want more?

Comment on one of my posts, talk to me, say:

[5] Subscribe to the Capsule's Feed
[6] Checkout the FatScript project on GitLab
[7] Checkout my projects on GitHub
[8] Checkout my projects on SourceHut

Join Geminispace

Gemini is a new Internet protocol introduced in 2019, as an alternative to http(s) or gopher, for lightweight text contents and better privacy.

Not sure how, but want to be part of the club? See:
[9] Gemini quick start guide

Already have a Gemini client?
[10] Navigate this capsule via Gemini

©, 2021-2023 - content on this site is licensed under
[11] Creative Commons BY-NC-SA 4.0 License
[12] Proudly built with GemPress
[13] Privacy Policy