Since Understanding Q-Networks, I've been rationalising some past concepts that I've pursued learning and feel like revisiting the derivative.
The derivative is fundamentally a description of the nature of how a function works, what its characteristics are, and so allows us to peek into how its behaviour is defined.
For example, \( f(x) \to y\) represents a function whose outward behaviour is the mapping of inputs x to outputs y. You can peek into how it does this if you can see how it represents its means of doing this mapping. For example, we can represent its work as a table:
| x | y |
| 1 | 5 |
| 3 | 3 |
| ... | ... |
The argument x is considered the primitive (and is the input), and so the derivative tells you about the nature of the primitive argument, x, considering its relationship with f() and y
Back to the derivative, this tells us specifically about the relationship between x and y, which are the input and output of the function, and therefore fundamentally will help describe more about the behaviour of the function concerning how x and y relate.
Really, the derivative is a description of the relationship between x and y. Specifically, the derivative of the relationship between x and y, which can only really be described as how x affects y and vice versa. More specifically, in order to talk about this relationship, one needs to consider putting it in the context of how, when x changes, y correspondingly changes. In this way, it makes sense to describe it as how a minimal change in x affects the associated change in y (remember, both input (x) and output (y) are absolutely dependent on one another). That is how to generally define how x affects y, and the derivative indicates or shows this.
The derivative, which we've only discussed in theory, is written in math notation like this \(f'(x) = \frac{dy}{dx}\), i.e the derivative of the function(x) is a ratio (read: relationship) between how the y varies and how the x varies. The d represents the idea of 'change', so the relationship between the change in x and the corresponding change in y.
It can also be written as \(\lim \limits_{\Delta x \to 0} \frac{f(x_0 + b) - f(x_0)}{\Delta x} \) which is another way of saying the same thing, i.e, the ration/relationship between a change in x and the corresponding change in y, but specifically this mentions that it wants to represent the amount of change that x experiences to be vanishingly small such that the relationship between such a small change can be also seen in a correspondingly small change in y. This is done so that we can reason about the most minute change in x causes a correspondingly minute change in y, and so we can start thinking that this would represent a fundamental characteristic of how x corresponds to y (this is why we minimise x to capture the essence of a change in x).
Intuitively:
A change in y only occurs because there was a corresponding change in x; therefore, quantify the change from its before state to its after state as how its dependency (x) changed:

Figure: Two points of function representing a before and after state of y
\(f(x_0 + b) \) is the before state and \(f(x_0)\) is the after state of y, while \(\Delta x\) is the change of difference between what x was at the before state and what it was at the after state.
Geometrically:

Figure: Visualising the derivative
The relationship between the hypotenuse and the adjacent side is what we want to quantify based on what we've been describing the derivative to be. This is exactly the value concerned when considering the angle of \(\tan \alpha\)
You can calculate different derivatives of a function at various points of on the function (see graph above depicting the function), depending on which two points on the function you choose (corresponds to two y-values and their corresponding x-values) before minimizing \(\Delta x\) so that the two points converge to a point where \(\Delta x \to 0 \) which shows where the tangent line would touch the function. The derivative, through its expression of change in x and y, indicates the tangent and the tangent can tell us about the characteristics of the function:

Figure: multiple derivatives
- The hypotenuse is formed by the two y-values (hypotenuse is the tangent!)
- The x-axis forms the adjacent side (when considering the tan of \(\alpha \)).
So the derivative tells you something about the nature of two points on the function and shows you something about the nature/form of the underlying function. What it tells you is that the function does between/with two inputs (over time), so the nature of the function between these two inputs can be exposed.
The derivative tells you about the nature of the curve by using two input values (x):
- Positive derivative value, like 4: function is increasing in value
- Negative derivative value like -2: the function is decreasing in value
- 0-value derivatives: indicate a local minima or maxima (saddle points) in the function
- 2nd derivative: indicates the concavity of the function, ie upward smile or downward frown form of the function/graph.
The first two points collectively are described as a description of the function's rate of change, i.e., how quickly (magnitude of the value) and in what direction the function is increasing with respect to the input variable, x.
In physics:
- The derivative of position is velocity
- The derivative of velocity is acceleration
Partial Derivatives
Partial derivatives describe the same thing in multivariate functions (or functions with multiple input variables), such as e.g., \(f(x,z) = y\); however, as multiple input variables (x and z) influence how the function can change, each has a partial influence on how the function changes with respect to that variable. Therefore, you have a partial derivative with respect to each of the input variables:
\( \frac{\partial f}{\partial x}\) is a partial derivative in respect to the change x has on the multivariate function, the other being \(\frac{\partial f}{\partial z}\).
Like the derivative, there are multiple partial derivatives.
For example, in a single-variable function (derivatives), there can be numerous derivatives at (for argument's sake) every point on the single-variable function. That's it, along the curve, you can have different rates of change that the function is experiencing, corresponding to how the x-value at that point causes the function to increase or decrease, which represents a positive or negative derivative at that point. This is the same idea for partial derivatives - you can have a partial derivative (for argument's sake) at each point.
A partial derivative is partial because it is only concerned with the change that occurs with respect to a single variable. The other variable also influences the function the but it gets its own partial derivative, and each partial derivative is defined to exclude the influence of the other variable. For example for \(f(x,z) = y\) you have a series of partial derivatives along the x axis and a series along the z-axis - represented symbollically as \( \frac{\partial f}{\partial x}\) and \( \frac{\partial f}{\partial z}\)
At a point in a multivariate function, i.e, where the input variables are used to produce that point, you therefore have two partial derivatives at that point, i.e, \( \frac{\partial f}{\partial x_i}\) and \( \frac{\partial f}{\partial z_i}\), which if you combine them becomes what is called the gradient at that point, i.e, \( \nabla f = [\frac{\partial f}{\partial x_i}, \frac{\partial f}{\partial z_i}]\) for more generally \( \nabla f = [\frac{\partial f}{\partial x}, \frac{\partial f}{\partial z}]\).
Also, like the derivative and the partial derivative, which can occur along the points of the function, the gradient also appears at every point in the function and the specific gradient at a specific point in a multi-variate function is indicated by this: \(\nabla f(x_0,y_0) = [ \frac{\partial f}{\partial x} | (x_0, y_0); \frac{\partial f}{\partial y}|(x_0,y_0)]\)
The gradient
While the derivative in single-variable functions and the partial derivative in multi-variable functions describe how the function increases or decreases at a particular point in the function, the gradient describes the combined influence of both (if two variables) partial derivatives at that point and in doing so indicates the direction of the steepest ascent, that is where the function is increasing the most from that point.
In other words:
The gradient of a multi-varible function expresses how the function changes with respect to input variables. A change in the function caused by a specific value of a variable is expressed as the partial derivative with respect to that variable at that point.
As the multi-variate function has two variables, x and z, changes in each affect the change in the function overall. Each is called a partial derivative.
If we combine the partial derivatives that exist in unison at apoint in the function (each partial derivative exists there) then we call that the function gradient at that point which is represented as \( \nabla f = [\frac{\partial f}{\partial x}, \frac{\partial f}{\partial z}]\) and this repreesnts the direction and the rate of the steepest local change, meaning where the function increases most rapidly (direction) and how fast it increases in that direction (magnitude). Together, this represents the direction of the steepest ascent - the place where the function is increasing.
The gradient is specific to a point on the function, so there are many gradients. Each shows in which direction and how fast the function increases from/at that point;
The gradient is a vector of the partial derivatives, one for each (with respect to) input variable:
\(\nabla_{x_n} f\ = [\frac{\partial f}{\partial x_1}, \frac{\partial f}{\partial x_2}, \frac{\partial f}{\partial x_3}, ... \frac{\partial f}{\partial x_n}]\) assuming there are n variables (and therefore n parital derivatives) in the multi-variate function f