A fairly common task, which quickly becomes second nature to physics students, is to compute the total derivative of a multi-dimensional time-dependent function. Here, we will only deal with “innocent” functions, that is, continuous functions that are totally differentiable everywhere in their domain.
which indirectly depends on t via x1(t), x2(t) and x3(t), but also has a direct dependency on t via the term πt. The total derivative is calculated by
One way to remember the formula is to form the total derivative with respect to all variables x1(t),x2(t),x3(t) and with respect to the independent variable t itself. With dx1df=∂x1∂f⋅dtdx1 and likewise for x2 and x3, we then obtain the above formula.
However, where does this equation really come from? In Calculus II, the multivariable chain rule is introduced
Jg∘f(x)=Jg(f(x))⋅Jf(x)
with Jf being the Jacobian matrix of function f. We will give a recap on this matrix and the chain rule to then examine the relationship between the multivariable chain rule and the total time derivative of a function f.
Jacobian matrix
Definition
Recall that for a function f:Uopen⊆Rn→Rm with f being totally differentiable, we define the Jacobian matrix of f at point x∈U as
The well-known chain rule derived in Calculus I reads:
For functions f:U→V and g:V→W, where U,V,W⊆R, with f being differentiable in x∈U and g being differentiable in f(x)∈V, the composite function g(f(x)) is differentiable and dxd(g(f(x))) is given by
dxd(g(f(x)))=g′(f(x))⋅f′(x)
Multiple variables
In multivariable calculus, the chain rule for one variable can be generalized to the following theorem (which we won’t prove here).
For Uopen⊆Rn,Vopen⊆Rm, let f:U→V be differentiable at point x∈U. Let g:V→Rk be differentiable at point y:=f(x)∈V. Visually, we are in the following situation:
The chain rule now states: g∘f:U→Rk is differentiable at x and
Jg∘f(x)=Jg(f(x))⋅Jf(x)∈Mk×n(R)
This is very similar to the case with one variable x∈U⊆R above, except now (x∈U⊆Rn) we make use of the Jacobian matrix and multiply matrices to account for the many variables x1,…,xn that x consists of (thus multivariable chain rule).
Example
As an example, take the function f:R3→R2 from above:
f(x1,x2,x3):=(x1+42x2,(x1+ex3)2)
and define g:R2→R,g(y):=y1⋅y2, which puts us in this situation:
The Jacobian matrix for g at y=(y1,y2)∈R2 is:
Jg(y)=((∂1g1)(y)(∂2g1)(y))=(y2y1)∈M1×2(R)
Having recalled the Jacobian matrix as well as the multivariable chain rule, we can finally go back to our original function f:R4→R given by
f(x(t),t)=2(x1(t))2+(ln(x2(t)))3+4x3(t)+πt,x1(t)=sin(t),x2(t)=cos(2t),x3(t)=t1The tricky part is to realize we can prefix f by another function γ which translates from the independent time variable t to the (n+1)-dimensional input vector (x1(t),…,xn(t),t) passed into f. Note that for f in the introduction, we have n=3, yet we will leave n generic here. The situation presents itself as follows in terms of a commutative diagram:
We define the function
γ:I→Rn+1,t↦γ(t):=(x1(t)⋯xn(t)t)
where I⊆R is the interval of permitted time values. We often set t:=0 for the start of a real measurement or thought experiment in physics and could therefore arbitrarily set I:=[0,∞). Furthermore, we demand every component of γ to be continuous, i.e.
γi:IcontinuousR∀i=1,…,n+1
With these properties, γ is a curve and describes a trajectory. It just happens that as the last component of our (n+1)-dimensional space, γ carries the time component itself, as we need to pass it into f as well (as seen before, f can directly depend on t, not just indirectly via x1(t) etc.).
The multivariable chain rule is now applicable to the composite function f∘γ (assuming differentiability of f at the given point x(t) and differentiability of γ at t). We obtain:
M1×1(R)≅RJf∘γ(t)=M1×(n+1)(R)Jf(γ(t))⋅M(n+1)×1(R)Jγ(t)=(∂x1∂f(γ(t))…∂xn∂f(γ(t))∂t∂f(γ(t)))⋅dtdx1(t)⋮dtdxn(t)1=∂x1∂f(γ(t))⋅dtdx1(t)+⋯+∂xn∂f(γ(t))⋅dtdxn(t)+∂t∂f(γ(t))⋅1=∂x1∂f(x(t)x1(t),…,xn(t),t)⋅dtdx1(t)+⋯+∂xn∂f(x(t),t)⋅dtdxn(t)+∂t∂f(x(t),t)⋅1=∂x∂f(x,t)⋅dtdx(t)+∂t∂f(x(t),t)⋅1=:dtdf(x,t)
This shows how the Jacobian matrix, multivariable chain rule, and the total derivative are connected. We started with a function f that had an explicit time dependency (in our case the term πt) and implicit time dependencies (since x1(t),…,xn(t) are time-dependent). We then wanted to compute the total derivative of f with respect to t, which is just asking for the time derivative of the composite function f∘γ, where γ is a curve passing in all the parameters to f, including time t itself. This is in fact the definition of the total derivative of f with respect to time t.
Note how the same formalism is applicable even when f is not directly dependent on t. In this case, we still pass in t as parameter to our function f, which is simply not using the variable at all. The term ∂t∂f(x,t) will then evaluate to 0 and we would get the same result as if we had just left out t as last entry of γ altogether. Therefore, our definition of γ stays consistent.
More examples
Let’s take a look at some more examples. As is common in physics, we omit the parentheses when a function is only dependent on t, e.g. x1=x1(t) to not clutter the visual appearance. In addition, we also set I:=R.
Example 1
One task might be to calculate the total time derivative of
f(x1,x2,x3,t)=4x12+3x22+πx3+2t,x1=x1(t)=sin(t),x2=x2(t)=cos(t),x3=x3(t)=cosh(t)
With the chain rule and the common notation dtdx1(t)=x˙1(t), we get:
dtdf(x1,x2,x3,t)=(∂x1∂f(γ(t))∂x2∂f(γ(t))∂x3∂f(γ(t))∂t∂f(γ(t)))⋅x˙1x˙2x˙31=∂x1∂f(γ(t))⋅x˙1+∂x2∂f(γ(t))⋅x˙2+∂x3∂f(γ(t))⋅x˙3+∂t∂f(γ(t))⋅1=8x1x˙1+6x2x˙2+πx˙3+2=8sin(t)cos(t)−6cos(t)sin(t)+πsinh(t)+2=2sin(t)cos(t)+πsinh(t)+2.
Example 2
For another physics problem, we might encounter the following function we want to calculate the total time derivative for:
f(x1,x˙1,t)=ax˙12−bx1x˙1,a,b∈R,x1=x1(t),x˙1=x˙1(t)
Again, employing the chain rule we obtain:
dtdf(x1,x˙1,t)=(∂x1∂f(γ(t))∂x˙1∂f(γ(t))∂t∂f(γ(t)))⋅x˙1x¨11=∂x1∂f(x1,x˙1,t)⋅x˙1+∂x˙1∂f(x1,x˙1,t)⋅x¨1+∂t∂f(x1,x˙1,t)⋅1=−bx˙1x˙1+(2ax˙1−bx1)x¨1+0=−bx˙12+(2ax˙1−bx1)x¨1
Example 3
In another task, students were presented this function
f(x1,x˙1,t)=−a⋅u(x1)⋅e−kt,a,k∈R,x1=x1(t),x˙1=x˙1(t)
and were asked to calculate the total time derivative. However, u(x1(t)) was never defined. As f was not dependent on u at all (see argument list of f), one had to assume u would be replaced by some term containing x1, e.g. 3x1+5. Then, we can proceed as usual:
dtdf(x1,x˙1,t)=∂x1∂f(x1,x˙1,t)⋅x˙1+∂x˙1∂f(x1,x˙1,t)⋅x¨1+∂t∂f(x1,x˙1,t)⋅1=−a⋅dxdx=x1u(x)⋅e−kt⋅x˙1+0+aku(x1)e−kt⋅1
where dxdx=x1u(x)=3 if u(x):=3x+5 (as an example).
Thanks to Paul Obernolte for having reviewed this post.
Note however that (Df)(x) is actually the differential of f at x, i.e. the linear map (Df)(x):Rn→Rm, y↦Jf(x)⋅y, where y∈Rn is a column vector. Hence, the equality Jf(x)=(Df)(x) should be understood in the sense that the Jacobian matrix at x is the transformation matrix of the linear map (Df)(x). ↩