Jacobi and the total time derivative

written by Splines
published


Motivation

A fairly common task, which quickly becomes second nature to physics students, is to compute the total derivative of a multi-dimensional time-dependent function. Here, we will only deal with “innocent” functions, that is, continuous functions that are totally differentiable everywhere in their domain.

Suppose we have the function f:R4Rf: \R^4 \rightarrow \R given by

f(x(t),t)=2(x1(t))2+(ln(x2(t)))3+4x3(t)+πt,x1(t)=sin(t),x2(t)=cos(2t),x3(t)=1t \begin{align*} f(\bm{x}(t), t) &= 2 \bigl(x_1(t)\bigr)^2 + \bigl(\ln(x_2(t))\bigr)^3 + 4 x_3(t) + \pi t,\\ &\quad x_1(t) = \sin(t), \quad x_2(t) = \cos(2t), \quad x_3(t) = \frac{1}{t} \end{align*}

which indirectly depends on tt via x1(t)x_1(t), x2(t)x_2(t) and x3(t)x_3(t), but also has a direct dependency on tt via the term πt\pi t. The total derivative is calculated by

ddtf(x(t),t)=xf(x(t),t)ddtx(t)+tf(x(t),t)=x1f(x(t),t)ddtx1(t)4x1(t)x˙1(t)+x2f(x(t),t)ddtx2(t)3(ln(x2(t)))21x2(t)x˙2(t)+x3f(x(t),t)ddtx3(t)4x˙3(t)+tf(x(t),t)π=4sin(t)cos(t)6(ln(cos(2t)))2tan(2t)4t2+π \begin{align*} \dv{t} f\bigl(\bm{x}(t), t\bigr) &= \pdv{\bm{x}} f\bigl(\bm{x}(t), t\bigr) \cdot \dv{t} \bm{x}(t) + \pdv{t} f\bigl(\bm{x}(t), t\bigr)\\ &= \underbrace{\pdv{x_1} f\bigl(\bm{x}(t), t\bigr) \cdot \dv{t} x_1(t)}_{4 x_1(t) \cdot \dot{x}_1(t)} + \underbrace{\pdv{x_2} f\bigl(\bm{x}(t), t\bigr) \cdot \dv{t} x_2(t)}_{3 \bigl(\ln(x_2(t))\bigr)^2 \frac{1}{x_2(t)} \cdot \dot{x}_2(t)}\\ &\quad + \underbrace{\pdv{x_3} f\bigl(\bm{x}(t), t\bigr) \cdot \dv{t} x_3(t)}_{4 \dot{x}_3(t)} + \underbrace{\pdv{t} f\bigl(\bm{x}(t), t\bigr)}_{\pi}\\ &= 4 \sin(t) \cos(t) - 6\bigl(\ln(\cos(2t))\bigr)^2 \tan(2t) - \frac{4}{t^2} + \pi \end{align*}

One way to remember the formula is to form the total derivative with respect to all variables x1(t),x2(t),x3(t)x_1(t), x_2(t), x_3(t) and with respect to the independent variable tt itself. With dfdx1=fx1dx1dt\dvv{f}{x_1} = \pdvv{f}{x_1} \cdot \dvv{x_1}{t} and likewise for x2x_2 and x3x_3, we then obtain the above formula.

However, where does this equation really come from? In Calculus II, the multivariable chain rule is introduced

Jgf(x)=Jg(f(x))Jf(x) \begin{align*} \boxed{J_{g\circ f}(\bm{x}) = J_g(f(\bm{x})) \cdot J_f(\bm{x})} \end{align*}

with JfJ_f being the Jacobian matrix of function ff. We will give a recap on this matrix and the chain rule to then examine the relationship between the multivariable chain rule and the total time derivative of a function ff.

Jacobian matrix

Definition

Recall that for a function f:UopenRnRmf: U \subopenin \R^n \rightarrow \R^m with ff being totally differentiable, we define the Jacobian matrix of ff at point xU\bm{x}\in U as

Jf(x)((jfi)(x))i=1,,mj=1,,n=((1f1)(x)(2f1)(x)(nf1)(x)(1fm)(x)(2fm)(x)(nfm)(x))Mm×n(R) \begin{align*} J_f(\bm{x}) &\coloneqq \Bigl((\partial_j f_i)(\bm{x})\Bigr) _{\substack{i=1,\dots, m\\j=1, \dots, n}}\\ &= \begin{pmatrix} (\partial_1 f_1)(\bm{x}) & (\partial_2 f_1)(\bm{x}) & \dots & (\partial_n f_1)(\bm{x})\\ \vdots & \vdots & \ddots & \vdots\\ (\partial_1 f_m)(\bm{x}) & (\partial_2 f_m)(\bm{x}) & \dots & (\partial_n f_m)(\bm{x}) \end{pmatrix} \in \Matrix_{m\times n}(\R) \end{align*}

where jfi\partial_j f_i is a shorthand for fixj\pdvv{f_i}{x_j}. Common notations for the Jacobian matrix include 1

Jf(x)=(Df)(x)=(f1,,fm)(x1,,xn)(x) \begin{align*} J_f(\bm{x}) = (Df)(\bm{x}) = \pdvv{(f_1, \dots, f_m)}{(x_1, \dots, x_n)}\,(\bm{x}) \end{align*}

Example

As an example, take the function f:R3R2f: \R^3 \rightarrow \R^2 given by f(x1,x2,x3)(x1+42x2,(x1+ex3)2) \begin{align*} f(x_1, x_2, x_3) \coloneqq \bigl(x_1 + 42 x_2,\quad (x_1 + e^{x_3})^2 \bigr) \end{align*}

Its Jacobian matrix at x=(x1,x2,x3)R3\bm{x} = (x_1, x_2, x_3)\in \R^3 is

Jf(x)=((1f1)(x)(2f1)(x)(3f1)(x)(1f2)(x)(2f2)(x)(3f2)(x))=(14202(x1+ex3)02ex3(x1+ex3))M2×3(R) \begin{align*} J_f(\bm{x}) &= \begin{pmatrix} (\partial_1 f_1)(\bm{x}) & (\partial_2 f_1)(\bm{x}) & (\partial_3 f_1)(\bm{x})\\ (\partial_1 f_2)(\bm{x}) & (\partial_2 f_2)(\bm{x}) & (\partial_3 f_2)(\bm{x}) \end{pmatrix}\\ &= \begin{pmatrix} 1 & 42 & 0\\ 2 (x_1 + e^{x_3}) & 0 & 2 e^{x_3} (x_1 + e^{x_3}) \end{pmatrix} \in \Matrix_{2\times 3}(\R) \end{align*}

Multivariable chain rule

One variable

The well-known chain rule derived in Calculus I reads:

For functions f:UVf: U\rightarrow V and g:VWg: V \rightarrow W, where U,V,WRU,V,W\subseteq \R, with ff being differentiable in xUx\in U and gg being differentiable in f(x)Vf(x)\in V, the composite function g(f(x))g(f(x)) is differentiable and ddx(g(f(x)))\dv{x} \Bigl(g(f(x))\Bigr) is given by

ddx(g(f(x)))=g(f(x))f(x) \begin{align*} \dv{x} \Bigl(g(f(x))\Bigr) = g'(f(x)) \cdot f'(x) \end{align*}

Multiple variables

In multivariable calculus, the chain rule for one variable can be generalized to the following theorem (which we won’t prove here).

For UopenRn,VopenRmU\subopenin \R^n, V\subopenin \R^m, let f:UVf:U\rightarrow V be differentiable at point xU\bm{x} \in U. Let g:VRkg: V\rightarrow \R^k be differentiable at point yf(x)V\bm{y} \coloneqq f(\bm{x}) \in V. Visually, we are in the following situation:

Chain rule diagram (since they are really hard to explain without seeing them, we will omit the alt attribute for the following images)

The chain rule now states: gf:URkg\circ f: U\rightarrow \R^k is differentiable at x\bm{x} and

Jgf(x)=Jg(f(x))Jf(x)Mk×n(R) \boxed{J_{g\circ f}(\bm{x}) = J_g(f(\bm{x})) \cdot J_f(\bm{x}) \quad \in \Matrix_{k\times n}(\R)}

This is very similar to the case with one variable xURx\in U\subseteq \R above, except now (xURn\bm{x}\in U\subseteq \R^n) we make use of the Jacobian matrix and multiply matrices to account for the many variables x1,,xnx_1, \dots, x_n that x\bm{x} consists of (thus multivariable chain rule).

Example

As an example, take the function f:R3R2f: \R^3 \to \R^2 from above: f(x1,x2,x3)(x1+42x2,(x1+ex3)2) \begin{align*} f(x_1, x_2, x_3) \coloneqq \bigl(x_1 + 42 x_2,\quad (x_1 + e^{x_3})^2 \bigr) \end{align*} and define g:R2R,g(y)y1y2g: \R^2 \rightarrow \R,\, g(\bm{y}) \coloneqq y_1 \cdot y_2, which puts us in this situation:

The Jacobian matrix for gg at y=(y1,y2)R2\bm{y} = (y_1, y_2) \in \R^2 is: Jg(y)=((1g1)(y)(2g1)(y))=(y2y1)M1×2(R) \begin{align*} J_g(\bm{y}) &= \begin{pmatrix} (\partial_1 g_1)(\bm{y}) & (\partial_2 g_1)(\bm{y}) \end{pmatrix} = \begin{pmatrix} y_2 & y_1 \end{pmatrix} \in \Matrix_{1\times 2}(\R) \end{align*}

and thus: Jg(f(x))=((x1+ex3)2x1+42x2)J_g(f(\bm{x})) = \begin{pmatrix} (x_1 + e^{x_3})^2 & x_1 + 42 x_2 \end{pmatrix}.

With the chain rule, we obtain:

Jgf(x)M1×3(R)=Jg(f(x))M1×2(R)Jf(x)M2×3(R)=((x1+ex3)2x1+42x2)(14202(x1+ex3)02ex3(x1+ex3))=((x1+ex3)2+(x1+42x2)2(x1+ex3).42(x1+ex3)2(x1+42x2)2ex3(x1+ex3))T \begin{align*} &\underbrace{J_{g\circ f}(\bm{x})}_{\in \Matrix_{1\times 3}(\R)} = \underbrace{J_g(f(\bm{x}))}_{\in \Matrix_{1\times 2}(\R)} \cdot \underbrace{J_f(\bm{x})}_{\in \Matrix_{2\times 3}(\R)}\\ &= \begin{pmatrix} (x_1 + e^{x_3})^2 & x_1 + 42 x_2 \end{pmatrix} \cdot \begin{pmatrix} 1 & 42 & 0 \quad\\ 2 (x_1 + e^{x_3}) & 0 & 2 e^{x_3} (x_1 + e^{x_3}) \end{pmatrix}\\ &= \begin{pmatrix} (x_1 + e^{x_3})^2 + (x_1 + 42 x_2) \cdot 2(x_1 + e^{x_3})\phantom{-.}\\ 42 (x_1 + e^{x_3})^2\\ (x_1 + 42 x_2) \cdot 2 e^{x_3} (x_1 + e^{x_3})\\ \end{pmatrix}^T \end{align*}

Total derivative in physics

Putting everything together

Having recalled the Jacobian matrix as well as the multivariable chain rule, we can finally go back to our original function f:R4Rf: \R^4 \rightarrow \R given by f(x(t),t)=2(x1(t))2+(ln(x2(t)))3+4x3(t)+πt,x1(t)=sin(t),x2(t)=cos(2t),x3(t)=1t \begin{align*} f(\bm{x}(t), t) &= 2 \bigl(x_1(t)\bigr)^2 + \bigl(\ln(x_2(t))\bigr)^3 + 4 x_3(t) + \pi t,\\ &\quad x_1(t) = \sin(t), \quad x_2(t) = \cos(2t), \quad x_3(t) = \frac{1}{t} \end{align*} The tricky part is to realize we can prefix ff by another function γ\gamma which translates from the independent time variable tt to the (n+1)(n+1)-dimensional input vector (x1(t),,xn(t),t)\bigl(x_1(t), \dots, x_n(t), t\bigr) passed into ff. Note that for ff in the introduction, we have n=3n=3, yet we will leave nn generic here. The situation presents itself as follows in terms of a commutative diagram:

We define the function γ:IRn+1,tγ(t)(x1(t)xn(t)t) \begin{align*} \gamma:\, &I\rightarrow \R^{n+1},\\ &t\mapsto \gamma(t) \coloneqq \begin{pmatrix} x_1(t) & \cdots & x_n(t) & t \end{pmatrix} \end{align*} where IRI\subseteq \R is the interval of permitted time values. We often set t0t \coloneqq 0 for the start of a real measurement or thought experiment in physics and could therefore arbitrarily set I[0,)I \coloneqq [0, \infty). Furthermore, we demand every component of γ\gamma to be continuous, i.e. γi:IcontinuousRi=1,,n+1 \gamma_i: I \xrightarrow{\text{continuous}} \R \quad \forall i=1,\dots, n+1 With these properties, γ\gamma is a curve and describes a trajectory. It just happens that as the last component of our (n+1)(n+1)-dimensional space, γ\gamma carries the time component itself, as we need to pass it into ff as well (as seen before, ff can directly depend on tt, not just indirectly via x1(t)x_1(t) etc.).

The multivariable chain rule is now applicable to the composite function fγf\circ \gamma (assuming differentiability of ff at the given point x(t)\bm{x}(t) and differentiability of γ\gamma at tt). We obtain: Jfγ(t)M1×1(R)R=Jf(γ(t))M1×(n+1)(R)Jγ(t)M(n+1)×1(R)=(x1f(γ(t))xnf(γ(t))tf(γ(t)))(ddtx1(t)ddtxn(t)1)=x1f(γ(t))ddtx1(t)++xnf(γ(t))ddtxn(t)+tf(γ(t))1=x1f(x1(t),,xn(t)x(t),t)ddtx1(t)++xnf(x(t),t)ddtxn(t)+tf(x(t),t)1=xf(x,t)ddtx(t)+tf(x(t),t)1ddtf(x,t) \begin{align*} &\underbrace{J_{f\circ \gamma}(t)}_{\Matrix_{1\times 1}(\R) \cong \R} = \underbrace{J_f(\gamma(t))}_{\Matrix_{1\times (n+1)}(\R)} \cdot \underbrace{J_\gamma(t)}_{\Matrix_{(n+1) \times 1}(\R)}\\ &= \begin{pmatrix} \pdv{x_1} f(\gamma(t)) & \dots & \pdv{x_n} f(\gamma(t)) & \pdv{t} f (\gamma(t)) \end{pmatrix} \cdot \begin{pmatrix} \dv{t} x_1(t)\\ \vdots\\ \dv{t} x_n(t)\\ 1 \end{pmatrix}\\ &= \pdv{x_1} f(\gamma(t)) \cdot \dv{t} x_1(t) + \dots + \pdv{x_n} f(\gamma(t)) \cdot \dv{t} x_n(t) + \pdv{t} f(\gamma(t)) \cdot 1\\ &= \pdv{x_1} f(\underbrace{x_1(t), \dots, x_n(t)}_{\bm{x}(t)}, t) \cdot \dv{t} x_1(t) + \dots + \pdv{x_n} f(\bm{x}(t), t) \cdot \dv{t} x_n(t)\\ &\quad + \pdv{t} f(\bm{x}(t), t) \cdot 1\\ &= \pdv{\bm{x}} f(\bm{x}, t) \cdot \dv{t} \bm{x}(t) \:+\: \pdv{t} f(\bm{x}(t), t) \cdot 1\\ &\eqqcolon \dv{t} f(\bm{x}, t) \end{align*}

This shows how the Jacobian matrix, multivariable chain rule, and the total derivative are connected. We started with a function ff that had an explicit time dependency (in our case the term πt\pi t) and implicit time dependencies (since x1(t),,xn(t)x_1(t), \dots, x_n(t) are time-dependent). We then wanted to compute the total derivative of ff with respect to tt, which is just asking for the time derivative of the composite function fγf\circ \gamma, where γ\gamma is a curve passing in all the parameters to ff, including time tt itself. This is in fact the definition of the total derivative of ff with respect to time tt.

Note how the same formalism is applicable even when ff is not directly dependent on tt. In this case, we still pass in tt as parameter to our function ff, which is simply not using the variable at all. The term tf(x,t)\pdv{t} f(\bm{x}, t) will then evaluate to 00 and we would get the same result as if we had just left out tt as last entry of γ\gamma altogether. Therefore, our definition of γ\gamma stays consistent.

More examples

Let’s take a look at some more examples. As is common in physics, we omit the parentheses when a function is only dependent on tt, e.g. x1=x1(t)x_1 = x_1(t) to not clutter the visual appearance. In addition, we also set IRI \coloneqq \R.

Example 1

One task might be to calculate the total time derivative of f(x1,x2,x3,t)=4x12+3x22+πx3+2t,x1=x1(t)=sin(t),  x2=x2(t)=cos(t),  x3=x3(t)=cosh(t) \begin{align*} &f\bigl(x_1, x_2, x_3, t\bigr) = 4 x_1^2 + 3 x_2^2 + \pi x_3 + 2t,\\ &x_1 = x_1(t) = \sin(t), \; x_2 = x_2(t) = \cos(t), \; x_3 = x_3(t) = \cosh(t) \end{align*}

With the chain rule and the common notation ddtx1(t)=x˙1(t)\dv{t} x_1(t) = \dot{x}_1(t), we get: ddtf(x1,x2,x3,t)=(x1f(γ(t))x2f(γ(t))x3f(γ(t))tf(γ(t)))(x˙1x˙2x˙31)=x1f(γ(t))x˙1+x2f(γ(t))x˙2+x3f(γ(t))x˙3+tf(γ(t))1=8x1x˙1+6x2x˙2+πx˙3+2=8sin(t)cos(t)6cos(t)sin(t)+πsinh(t)+2=2sin(t)cos(t)+πsinh(t)+2. \begin{align*} &\dv{t} f\bigl(x_1, x_2, x_3, t\bigr)\\ &= \begin{pmatrix} \pdv{x_1} f(\gamma(t)) & \pdv{x_2} f(\gamma(t)) & \pdv{x_3} f(\gamma(t)) & \pdv{t} f(\gamma(t)) \end{pmatrix} \cdot \begin{pmatrix} \dot{x}_1\\ \dot{x}_2\\ \dot{x}_3\\ 1 \end{pmatrix}\\ &= \pdv{x_1} f(\gamma(t)) \cdot \dot{x}_1 + \pdv{x_2} f(\gamma(t)) \cdot \dot{x}_2 + \pdv{x_3} f(\gamma(t)) \cdot \dot{x}_3\\ &\quad + \pdv{t} f(\gamma(t)) \cdot 1\\ &= 8 x_1 \dot{x}_1 + 6x_2 \dot{x}_2 + \pi \dot{x}_3 + 2\\ &= 8 \sin(t) \cos(t) - 6 \cos(t) \sin(t) + \pi \sinh(t) + 2\\ &= \solution{2 \sin(t) \cos(t) + \pi \sinh(t) + 2}. \end{align*}

Example 2

For another physics problem, we might encounter the following function we want to calculate the total time derivative for: f(x1,x˙1,t)=ax˙12bx1x˙1,a,bR,x1=x1(t),x˙1=x˙1(t) \begin{align*} f\bigl(x_1, \dot{x}_1, t\bigr) &= a \dot{x}_1^2 - b x_1 \dot{x}_1, \quad a,b\in \R, \, x_1=x_1(t), \, \dot{x}_1 = \dot{x}_1(t) \end{align*}

Again, employing the chain rule we obtain: ddtf(x1,x˙1,t)=(x1f(γ(t))x˙1f(γ(t))tf(γ(t)))(x˙1x¨11)=x1f(x1,x˙1,t)x˙1+x˙1f(x1,x˙1,t)x¨1+tf(x1,x˙1,t)1=bx˙1x˙1+(2ax˙1bx1)x¨1+0=bx˙12+(2ax˙1bx1)x¨1 \begin{align*} &\dv{t} f\bigl(x_1, \dot{x}_1, t\bigr) = \begin{pmatrix} \pdv{x_1} f(\gamma(t)) & \pdv{\dot{x}_1} f(\gamma(t)) & \pdv{t} f(\gamma(t)) \end{pmatrix} \cdot \begin{pmatrix} \dot{x}_1\\ \ddot{x}_1\\ 1 \end{pmatrix}\\ &= \pdv{x_1} f\bigl(x_1, \dot{x}_1, t\bigr) \cdot \dot{x}_1 + \pdv{\dot{x}_1} f\bigl(x_1, \dot{x}_1, t\bigr) \cdot \ddot{x}_1 + \pdv{t} f\bigl(x_1, \dot{x}_1, t\bigr) \cdot 1\\ &= -b \dot{x}_1 \dot{x}_1 + \bigl(2a\dot{x}_1 - b x_1\bigr) \ddot{x}_1 + 0\\ &= \solution{-b \dot{x}_1^2 + \bigl(2a\dot{x}_1 - b x_1\bigr) \ddot{x}_1} \end{align*}

Example 3

In another task, students were presented this function f(x1,x˙1,t)=au(x1)ekt,a,kR,x1=x1(t),x˙1=x˙1(t) \begin{align*} f\bigl(x_1, \dot{x}_1, t\bigr) &= -a \cdot u(x_1) \cdot e^{-kt}, \quad a,k \in \R, \, x_1 = x_1(t),\, \dot{x}_1 = \dot{x}_1(t) \end{align*}

and were asked to calculate the total time derivative. However, u(x1(t))u(x_1(t)) was never defined. As ff was not dependent on uu at all (see argument list of ff), one had to assume uu would be replaced by some term containing x1x_1, e.g. 3x1+53 x_1 + 5. Then, we can proceed as usual: ddtf(x1,x˙1,t)=x1f(x1,x˙1,t)x˙1+x˙1f(x1,x˙1,t)x¨1+tf(x1,x˙1,t)1=addxx=x1u(x)ektx˙1+0+aku(x1)ekt1 \begin{align*} &\dv{t} f\bigl(x_1, \dot{x}_1, t\bigr)\\ &= \pdv{x_1} f\bigl(x_1, \dot{x}_1, t\bigr) \cdot \dot{x}_1 + \pdv{\dot{x}_1} f\bigl(x_1, \dot{x}_1, t\bigr) \cdot \ddot{x}_1 + \pdv{t} f\bigl(x_1, \dot{x}_1, t\bigr) \cdot 1\\ &= \solution{-a \cdot \dv{x}\Bigr\rvert_{x=x_1} u(x) \cdot e^{-kt} \cdot \dot{x}_1 + 0 + a k u(x_1) e^{-kt} \cdot 1} \end{align*} where ddxx=x1u(x)=3\dv{x}\bigr\rvert_{x=x_1} u(x) = 3 if u(x)3x+5u(x) \coloneqq 3x + 5 (as an example).

Thanks to Paul Obernolte for having reviewed this post.

  1. Note however that (Df)(x)(Df)(\bm{x}) is actually the differential of ff at x\bm{x}, i.e. the linear map (Df)(x):RnRm(Df)(\bm{x}): \R^n \rightarrow \R^m, yJf(x)y\bm{y} \mapsto J_f(\bm{x}) \cdot \bm{y}, where yRn\bm{y}\in \R^n is a column vector. Hence, the equality Jf(x)=(Df)(x)J_f(\bm{x}) = (Df)(\bm{x}) should be understood in the sense that the Jacobian matrix at x\bm{x} is the transformation matrix of the linear map (Df)(x)(Df)(\bm{x})

Back
Back