Consider a function .
Take a point , and a linear transformation .
Then, we say that is differentiable at with derivative , if:
(Note: this is a special case of Fréchet differentiability for functions between Banach spaces)
In a sense, this means that is the best linear approximation of near .
Let us define the error term :
where we can think of .
Then, the definition of differentiability can be rewritten as
which gives
Now, given that , the linear transformation is unique if it exists, and we call it the (total) derivative of at , and denote it as , which can be represented as a matrix (later discussed).
Proof
Assume that are two derivatives of at .
Then, we have a vector such that .
This means that
for any .
Now, we let ,
taking the norm and dividing by , we have
since both numerator and denominator is an Euclidean norm, they are both positive, and we have
Taking the limit as , because and are both derivatives of at , we have
which is a contradiction. Therefore, the derivative of at is unique if it exists.
Directional Derivative
On the other hand, we have a different notion of derivative, called directional derivative.
Given a function , a point , and a vector ,
we say that is differentiable in the direction of at if the following limit exists:
where is in for sufficiently small .
Directional derivative can be written using the total derivative:
Proof
Assume that is differentiable at with derivative , and take a vector .
Since is differentiable at , we have
Let , we have
Since is a fixed vector, is a constant which can be factored out:
The inside of the norm is a vector in , and the limit of its norm is , which means that the vector is a zero vector in the limit:
by term wise limit,
Since the limit exists, the right limit are equal to the limit:
and the LHS is exactly the definition of the directional derivative , we have
Thus, differentiability is sufficient for the existence of directional derivative. (note that the converse is not true)
Partial Derivative
A special case of directional derivative is the partial derivative.
Remember that the directional derivative can be written as follows:
If we take , a standard Euclidean basis vector, the direction is along the -th coordinate axis.
So, given a function , a point , and a standard Euclidean basis vector , we say that is differentiable in the direction of at if the following limit exists:
in which case, the limit is called the partial derivative of with respect to at .
This is useful because now we can write directional derivatives as a linear combination of partial derivatives:
(Note: this will be important for tangent spaces of manifolds)
For a scalar function , the Jacobian matrix is a matrix, which can be identified with a row vector, called the gradient of at , sometimes denoted as :
which makes the directional derivative a matrix multiplication of the gradient and the vector :
If is vector-valued, i.e. , the total derivative can be represented by Jacobian matrix or derivative matrix, which is a matrix:
Let us check that it in fact satisfies , given :
Properties of Derivatives
Chain Rule
Consider functions and , and a point .
Assume that is differentiable at , and , at which is differentiable.
Then, the composition is differentiable at , and its derivative is given by the composition of the derivatives:
This is called the chain rule.
Note that this follows the same idea as the chain rule for single variable functions.
Proof
Since is differentiable at ,
where and is a function such that .
Now, since is differentiable at , we have
where is a function such that .
Take such that , we have
Thus,
Thus to show that the total derivative of at is , we need to show that
The norm on the numerator can be bounded by the triangle inequality:
For the second term,
Notice that the coefficient is of the form
which goes to as , as well as as .
Substituting back to the limit,
All the term goes to by the definition of , so the whole limit goes to 0, and term goes to because of the coefficient, thus we have
which shows the chain rule.
Leibniz/ Product Rule
Now, there is an interesting corollary of the chain rule.
Take two functions , and combine them to make a vector-valued function defined as .
The Jacobian matrix of is a matrix
Consider a multiplication map defined as .
More explicitly, we call this the pointwise product of and :
Here, we notice that is differentiable at any point , and its derivative is a matrix (a row vector):
Thus by chain rule,
This is called the Leibniz rule or product rule of derivatives.
-class Functions and Higher Order Derivatives
-class Functions
Now, coming back to the total derivative of a vector-valued function , we have the total derivative at each interior point , which is a linear transformation from to :
Notice that the domain of and is not technically the original , but the tangent space of .
To articulate, remember that can be written as
is a vector, that does not live in the original where lives, but it is rather difference between two nearby points that are in the original , that happens to be a vector in .
We should express this distinction by writing , where is the tangent space of at (explained in manifold section later).
Also, in a similar way, while looks like a vector in the original range , but in essence is a difference between and , which should also live in the tangent space of at , denoted as :
As for reasons I explain later in manifold section, the tangent space of (subsets of) Euclidean space is canonically isomorphic to the original , so we regard and as and respectively, and write
This is a linear transformation between two vector spaces, but also it gives another map , that takes a point and produces a linear transformation , where
is the set of all linear transformations from to .
In short, we now have a (crudely speaking) matrix-valued map :
Now, we define a class of functions, called -class functions.
Given a function :
-class function
if it is continuous at every point in .
-class function ()
if the derivative is a -class function.
While we have the definition, it would not be as useful without the ability to actually compute actual functions to show that they are -class functions.
Higher Order Derivatives
Now, speaking of canonical isomorphism, there is actually a canonical isomorphism between and (refer here).
Thus we can identify as a map from to , and consider its derivative , which is a map from to .
Since is canonically isomorphic to , we can identify as a map:
Since this is an ordinary vector valued map, we can take its derivative again (assuming is differentiable) to get :
and in a similar manner, we identify the domain and range as