Derivatives

Differentiability and Derivatives

Total Derivative

Consider a function . Take a point , and a linear transformation .

Then, we say that is differentiable at with derivative , if:

(Note: this is a special case of Fréchet differentiability for functions between Banach spaces)

In a sense, this means that is the best linear approximation of near . Let us define the error term :

where we can think of . Then, the definition of differentiability can be rewritten as

which gives

Now, given that , the linear transformation is unique if it exists, and we call it the (total) derivative of at , and denote it as , which can be represented as a matrix (later discussed).

Proof

Assume that are two derivatives of at . Then, we have a vector such that . This means that

for any . Now, we let ,

taking the norm and dividing by , we have

since both numerator and denominator is an Euclidean norm, they are both positive, and we have

Taking the limit as , because and are both derivatives of at , we have

which is a contradiction. Therefore, the derivative of at is unique if it exists.

Directional Derivative

On the other hand, we have a different notion of derivative, called directional derivative. Given a function , a point , and a vector , we say that is differentiable in the direction of at if the following limit exists:

where is in for sufficiently small .

Directional derivative can be written using the total derivative:

Proof

Assume that is differentiable at with derivative , and take a vector . Since is differentiable at , we have

Let , we have

Since is a fixed vector, is a constant which can be factored out:

The inside of the norm is a vector in , and the limit of its norm is , which means that the vector is a zero vector in the limit:

by term wise limit,

Since the limit exists, the right limit are equal to the limit:

and the LHS is exactly the definition of the directional derivative , we have

Thus, differentiability is sufficient for the existence of directional derivative. (note that the converse is not true)

Partial Derivative

A special case of directional derivative is the partial derivative. Remember that the directional derivative can be written as follows:

If we take , a standard Euclidean basis vector, the direction is along the -th coordinate axis. So, given a function , a point , and a standard Euclidean basis vector , we say that is differentiable in the direction of at if the following limit exists:

in which case, the limit is called the partial derivative of with respect to at .

This is useful because now we can write directional derivatives as a linear combination of partial derivatives:

(Note: this will be important for tangent spaces of manifolds)

For a scalar function , the Jacobian matrix is a matrix, which can be identified with a row vector, called the gradient of at , sometimes denoted as :

which makes the directional derivative a matrix multiplication of the gradient and the vector :

If is vector-valued, i.e. , the total derivative can be represented by Jacobian matrix or derivative matrix , which is a matrix:

Let us check that it in fact satisfies , given :

Properties of Derivatives

Chain Rule

Consider functions and , and a point . Assume that is differentiable at , and , at which is differentiable. Then, the composition is differentiable at , and its derivative is given by the composition of the derivatives:

This is called the chain rule.

Note that this follows the same idea as the chain rule for single variable functions.

Proof

Since is differentiable at ,

where and is a function such that . Now, since is differentiable at , we have

where is a function such that . Take such that , we have

Thus,

Thus to show that the total derivative of at is , we need to show that

The norm on the numerator can be bounded by the triangle inequality:

For the second term,

Notice that the coefficient is of the form

which goes to as , as well as as . Substituting back to the limit,

All the term goes to by the definition of , so the whole limit goes to 0, and term goes to because of the coefficient, thus we have

which shows the chain rule.

Leibniz/ Product Rule

Now, there is an interesting corollary of the chain rule.

Take two functions , and combine them to make a vector-valued function defined as . The Jacobian matrix of is a matrix

Consider a multiplication map defined as . More explicitly, we call this the pointwise product of and :

Here, we notice that is differentiable at any point , and its derivative is a matrix (a row vector):

Thus by chain rule,

This is called the Leibniz rule or product rule of derivatives.

-class Functions and Higher Order Derivatives

-class Functions

Now, coming back to the total derivative of a vector-valued function , we have the total derivative at each interior point , which is a linear transformation from to :

Notice that the domain of and is not technically the original , but the tangent space of . To articulate, remember that can be written as

is a vector, that does not live in the original where lives, but it is rather difference between two nearby points that are in the original , that happens to be a vector in . We should express this distinction by writing , where is the tangent space of at (explained in manifold section later).

Also, in a similar way, while looks like a vector in the original range , but in essence is a difference between and , which should also live in the tangent space of at , denoted as :

As for reasons I explain later in manifold section, the tangent space of (subsets of) Euclidean space is canonically isomorphic to the original , so we regard and as and respectively, and write

This is a linear transformation between two vector spaces, but also it gives another map , that takes a point and produces a linear transformation , where

is the set of all linear transformations from to . In short, we now have a (crudely speaking) matrix-valued map :

Now, we define a class of functions, called -class functions. Given a function :

  • -class function

    • if it is continuous at every point in .
  • -class function ()

    • if the derivative is a -class function.

While we have the definition, it would not be as useful without the ability to actually compute actual functions to show that they are -class functions.

Higher Order Derivatives

Now, speaking of canonical isomorphism, there is actually a canonical isomorphism between and (refer here). Thus we can identify as a map from to , and consider its derivative , which is a map from to . Since is canonically isomorphic to , we can identify as a map:

Since this is an ordinary vector valued map, we can take its derivative again (assuming is differentiable) to get :

and in a similar manner, we identify the domain and range as