Chapter 12 Derivatives for Multivariable Functions

Author

Jiaye Xu

Published

March 26, 2025

12.1 Functions of Several Variables

Functions of Two Variables

A real-valued function of two real variables

Graphs

The graph of a function \(f\) of two variables, that is, the graph of the equation \(z=f(x,y)\), is normally a surface. A typical example is in the following figure

Comment: Each \((x, y)\) in the domain corresponds one value \(z\), hence each line perpendicular to the \(xy\)-plane intersects the surface in at most one point.

Computer Graphs – 3D Surface Plot

A number of software packages, including Maple and Mathematica, can produce complicated three-dimensional graphs with ease.

Static graphs, the examples in textbook.
Interactive graphs, created by software, e.g., R, tons of learning resources out there.

3D Surface Plot using R (Optional)

Three-Step Procedure:

Define \(x\) and \(y\);
Define function \(z=f(x,y)\);
Plot.

# Step 1
x <- seq(-10, 10, by = 1) # vector of length = 21
y <- seq(-10, 10, by = 1)

# Step 2
f <- function(x,y){return(y^2 -x^2)} # Hyperbolic Paraboloid
z <- outer(x,y,f) # a matrix: 21 by 21, where z_{ij} is the function value of (x_i, y_j)

# Step 3
persp(x, y, z)

Customize your Plot in R (Optional)

In function persp, theta and phi are angles defining the viewing direction. theta gives the azimuthal direction and phi the colatitude (the complement of the latitude). That is, the argument theta means moving along “longitudes” of the sphere, while the argument phi means moving along “latitudes” of the sphere.

persp(x, y, z, theta=50) # clockwise

persp(x, y, z, theta=-50) # counter-clockwise

persp(x, y, z, phi=30) # "higer-latitude"

persp(x, y, z, phi= -30) # "lower-latitude"

persp(x, y, z, theta=-50, phi = 30, col = "lightblue")

persp(x, y, z, theta=-50, phi = 30, col = "lightblue", box=FALSE) # remove the box

persp(x, y, z, theta=-50, phi = 30, col = "lightblue", ticktype="detailed") # add ticks to the axes

Level Curves and Contour Map

Map makers have given us another and usually simpler way to picture a surface: the contour map.

Each horizontal plane intersects the surface in a curve.The projection of this curve on the \(xy\)-plane is called a level curve, and a collection of such curves is a contour plot or a contour map.

Contour Plot using R (Optional)

contour(x, y, z, col="blue")

Functions of Three Variables

A number of quantities depend on three or more variables. For example, the temperature in a large auditorium may depend on the location \((x, y, z)\); this leads to the function \(T(x, y, z)\).

We can visualize functions of three variables by plotting level surfaces.

For example, The level surfaces of \(f(x, y, z) = 2x^2 + y^2 + z^2\) are concentric spheres.

12.2 Partial Derivatives

Partial Derivatives of a Function of Two Variables

Suppose \(f(x,y)\) is a function of two variables \(x\) and \(y\).

If \(y\) is held constant, that is \(y=y_0\), then \(f(x,y_0)\) is a function of the single variable \(x\). Its derivative at \(x=x_0\) is called the partial derivative of \(f\) with respect to \(x\) at \((x_0,y_0)\), denoted by \(f_x(x_0,y_0)\).
If \(x\) is held constant, that is \(x=x_0\), then \(f(x_0,y)\) is a function of the single variable \(y\). Its derivative at \(y=y_0\) is called the partial derivative of \(f\) with respect to \(y\) at \((x_0,y_0)\), denoted by \(f_y(x_0,y_0)\).

N.B. The rules for differentiating a function of one variable in Chapter 3 work for finding partial derivatives, as long as we hold one variable fixed.

Notations for partial derivatives

Be aware that

Geometric and Physical Interpretations

Consider the surface whose equation is \(z=f(x,y)\).

The plane \(y=y_0\) intersects this surface in the plane curve \(QPR\), and the value of \(f_x(x_0,y_0)\) is the slope of the tangent line to this curve at \(P(x_0,y_0, f(x_0,y_0))\).

The plane \(x=x_0\) intersects the surface in the plane curve \(LPM\), and \(f_y(x_0,y_0)\) is the slope of the tangent line to this curve at \(P\).

Physical Interpretation

Partial derivatives may also be interpreted as (instantaneous) rates of change.

Example of a Violin String:

Suppose that a violin string is fixed at points \(A\) and \(B\) and vibrates in the \(xz\)-plane.

Let \(z=f(x,t)\) denote the height of the string at the point \(P\) with \(x\)-coordinate \(x\) at time \(t\), then,

\(\partial z/\partial x\) is the slope of the string at \(P\).
\(\partial z/\partial t\) is the time rate of change of height of \(P\) along the indicated vertical line. In other words, \(\partial z/\partial t\) is the vertical velocity of \(P\).

Higher Partial Derivatives

Second partial derivatives of \(f(x,y)\)

N.B. \(f_{xy}=f_{yx}\) is usually the case for the functions of two variables. A criterion for this equality will be given in Section 12.3 (Theorem C).

Partial derivatives of the third and even higher orders are defined analogously, for example

Functions of More Than Two Variables

If \(f\) is a function of three variables, \(x\), \(y\), and \(z\).

Partial derivatives, such as \(f_{xy}\) and \(f_{xyz}\) that involve differentiation with respect to more than one variable are called mixed partial derivatives.

12.3 Limit and Continuity

The intuitive meaning of the limit statement

The values of \(f(x,y)\) get closer and closer to the number \(L\) as \((x, y)\) approaches \((a, b)\).

N.B. \((x, y)\) can approach \((a, b)\) in infinitely many ways.

To interpret this definition, we write \[\|(x,y)-(a,b)\|=\sqrt{(x-a)^2+(y-b)^2}\]

and then \(\{(x,y): 0<\|(x,y)-(a,b)\|<\delta\}\) represents the points inside a disk, a circle of radius \(\delta\) except the center \((a,b)\).

Comments:

Limits by Substitution

Before we state a theorem that justifies evaluating limits by substitution, we give a few definitions:

A polynomial in the variables \(x\) and \(y\)

A rational function in the variables \(x\) and \(y\)

where \(p\) and \(q\) are polynomials in \(x\) and \(y\), assuming \(q\) is not identically zero.

A Problem-Solving Trick: Polar Coordinate

It is often easier to analyze limits of functions of two variables, especially limits at the origin, by changing to polar coordinates. Thus, limits for functions of two variables can sometimes be expressed as limits involving just one variable, \(r\).

N.B. The key point is that \((x, y)\to (0,0)\) if and only if \(r=\sqrt{x^2+y^2}\to 0\).

Continuity at a Point

To state that \(f(x,y)\) is continuous at the point \((a,b)\), we need the following:

\(f(x,y)\) has a value at \((a,b)\);
\(f(x,y)\) has a limit at \((a,b)\);
\(f(x,y)\) is equal to the limit at \((a,b)\)

In summary, that is

Continuity of Functions

Polynomial functions are continuous for all (x, y).
Rational functions are continuous everywhere except where the denominator is equal to 0.
Sums, differences, products, and quotients of continuous functions are continuous (provided, in the latter case that we avoid division by \(0\)).
Composition of functions.

Continuity on a Set

A neighborhood of radius \(\delta\) of a point \(P\) is the set of all points \(Q\) satisfying \[\|Q-P\|<\delta\]

In two-space, a neighborhood is the “inside” of a circle; in three-space, it is the inside of a sphere.

Definitions

A point \(P\) is an interior point of a set \(S\) if there is a neighborhood of \(P\) contained in \(S\).
- The set of all interior points of \(S\) is the interior of \(S\).
\(P\) is a boundary point of \(S\) if every neighborhood of \(P\) contains points that are in \(S\) and points that are not in \(S\).
- The set of all boundary points of \(S\) is called the boundary of \(S\).
A set is open if all its points are interior points, and it is closed if it contains all its boundary points.
A set \(S\) is bounded if there exists an \(R>0\) such that all ordered pairs in \(S\) are inside a circle of radius \(R\) centered at the origin.

Interpretations of the Definition:

A point \((x_0, y_0)\) in a region (or set) \(R\) in the \(xy\)-plane is an interior point of \(R\) if it is the center of a disk of positive radius that lies entirely in \(R\).
A point \((x_0, y_0)\) is a boundary point of \(R\) if every disk centered at \((x_0, y_0)\) contains points that lie outside of \(R\) as well as points that lie in \(R\). (The boundary point itself need not belong to \(R\).)
A region (or set) in the plane is bounded if it lies inside a disk of finite radius. A region is unbounded if it is not bounded.

If \(S\) is an open set, to say that \(f\) is continuous on \(S\) means that \(f\) is continuous at every point of \(S\).

N.B. If \(S\) contains some or all of its boundary points, we must be careful to give the right interpretation of continuity at such points.

To say that \(f\) is continuous at a boundary point \(P\) of \(S\) means that \(f(Q)\) must approach \(f(P)\) as \(Q\) approaches \(P\) through points of \(S\).

N.B. The order of differentiation in mixed partial derivatives doesn’t matter.

12.4 Differentiability

Heuristic Questions:

Q: Is a function of two variable differentiable at a certain point?

A: It requires the existence of a tangent plane – more than the mere existence of the partial derivatives of \(f\).

For example,

N.B. A tangent plane ought to approximate the graph very well in all directions.

Local Linearity

N.B. Local linearity is another way to look at differentiability.

Differentiability of a single-variable function:

If \(f\) is differentiable at \(a\), then there exists a tangent line through \((a,f(a))\) that approximates the function for values of \(x\) near \(a\). In other words, \(f\) is almost linear near \(a\).

Precisely, we say that a function \(f\) is locally linear at \(a\) if there is a constant \(m\) such that \[f(a+h)=f(a)+hm+h\epsilon(h)\] where \(\epsilon(h)\) is a function satisfying \(\lim_{h\to 0}\epsilon (h)=0\).

Interpretation of the local linearity:

Note that the function \(\epsilon(h)\) is the difference between the slope of the secant line through the points \((a,f(a))\) and \((a+h, f(a+h))\) and the slope of the tangent line through \((a,f(a))\).

If function \(f\) is locally linear at \(a\), then

We conclude that \(f\) must be differentiable at \(a\) and that \(m\) must equal \(f^\prime(a)\).

The concept of local linearity carries over to the situation of the function of two variables:

If we zoom in far enough, the surface resembles a plane, and the contour plot appears to consist of parallel lines.

Using the vector notation (for the moment we downplaying the distinction between the point and vector, that is, \(\mathbf p=(x,y)=<x,y>\)), we define \(\mathbf p_0=(a,b)\), \(\mathbf h=(h_1,h_2)\), and \(\epsilon (\mathbf h)=(\epsilon_1(h_1,h_2),\epsilon_2(h_1,h_2))\).

Here the function \(\epsilon (\mathbf h)\) is a vector-valued function of a vector variable.

Therefore, the formula in the definition of local linearity can be written using the vector notation

Comment: This formulation easily carries over to the case where \(f\) is a function of three (or more) variables.

N.B. Differentiability is synonymous with local linearity.

Differentiability

Comments: The gradient becomes the analog of the derivative.

A condition of the differentiability at a point

The following theorem gives a condition that guarantees the differentiability of a function at a point:

(proof is optional)

The key steps of proof:

By the Mean Value Theorem for Derivatives,

and

Tangent Plane

Rules for Gradients

Just like \(D\) for derivatives, \(\nabla\) for gradients is also a linear operator.

Continuity vs. Differentiability

Recall that for functions of one variable, differentiability implies continuity, but not vice versa. The same is true here.

12.6 Chain Rules

Our goal in this section is to generalize the chain rule for the single-value function to the versions for the multivariable function.

First Version

(Proof is optional.)

Key steps of the proof:

Using the definition of local linearity,

Comment:

We could have done this example using the direct substitution instead of the chain rule. However, the direct substitution method is often not available or not convenient.

Comment: The chain rule in Theorem A can easily extend to a function of three variables.

Second Version

Implicit Functions

Let’s think of this situation of implicit function:

Suppose that \(F(x,y)=0\) defines \(y\) implicitly as a function of \(x\), for example, \(y=g(x)\), but that the function \(g\) is difficult or impossible to determine.

If our goal is to find \(dy/dx\), one method for doing this that we learnt in chapter 3 is the implicit differentiation. Here is another method using the Chain Rule:

Furthermore, consider this situation:

If \(z\) is an implicit function of \(x\) and \(y\) defined by the equation \(F(x,y,z)=0\). Then,

differentiation of both sides with respect to \(x\), holding \(y\) fixed,

\[\frac{\partial F}{\partial x}\frac{\partial x}{\partial x}+\frac{\partial F}{\partial y}\frac{\partial y}{\partial x}+\frac{\partial F}{\partial z}\frac{\partial z}{\partial x}=0\]

differentiation of both sides with respect to \(y\), holding \(x\) fixed,

\[\frac{\partial F}{\partial x}\frac{\partial x}{\partial y}+\frac{\partial F}{\partial y}\frac{\partial y}{\partial y}+\frac{\partial F}{\partial z}\frac{\partial z}{\partial y}=0\]

Note that \(\frac{\partial y}{\partial x}=\frac{\partial x}{\partial y}=0\), with some simplification, we have the following formula

12.5 Directional Derivatives and Gradients

Partial Derivative vs. Directional Derivatives:

Partial derivatives: measure the rate of change (and the slope of the tangent line) in directions parallel to the \(x\)- and \(y\)-axes.
Directional derivatives: measure the rate of change in an arbitrary direction.

In vectors notations:

Partial derivatives:

where \(\mathbf i\) and \(\mathbf j\) are the unit vectors in the positive \(x\)- and \(y\)-axes.
Directional derivatives: we replace \(\mathbf i\) and \(\mathbf j\) by an arbitrary unit vector \(\mathbf u\) and give the following definition,

Comments:

\(D_{\mathbf i}f(\mathbf p)=f_x(\mathbf p)\) and \(D_{\mathbf j}f(\mathbf p)=f_y(\mathbf p)\).
We also use the notation \(D_{\mathbf u}f(x,y)\), since \(\mathbf p=(x,y)\).

Geometric Interpretations of \(D_{\mathbf u}f(x_0,y_0)\):

The vector \(\mathbf u\) determines a line \(L\) in the \(xy\)-plane through \((x_0,y_0)\).
The plane through \(L\) perpendicular to the \(xy\)-plane intersects the surface \(z=f(x,y)\) in a curve \(C\).
\(D_{\mathbf u}f(x_0,y_0)\) is the slope of the tangent line to \(C\) at the point \((x_0,y_0,f(x_0,y_0))\), and it measures the rate of change of \(f\) with respect to distance in the direction \(\mathbf u\).

Connection with the Gradient

Recall the definition formula of the gradient

\[ \nabla f(\mathbf p)=f_x(\mathbf p)\mathbf i+f_y(\mathbf p)\mathbf j \]

For functions of three or more variables, just some obvious modifications.

Maximum Rate of Change

Question:

In what direction the function is changing most rapidly, that is, in what direction is the \(D_{\mathbf u}f(\mathbf p)\) largest?

A quick answer: In the direction of the gradient.

To understand why this theorem is true:

Using dot product formula to write out \(D_{\mathbf u}f(\mathbf p)\), we obtain

where \(\theta\) is the angle between \(\mathbf u\) and \(\nabla f(\mathbf p)\).

Note that \(D_{\mathbf u} f(\mathbf p)\) is maximized when \(\theta=0\) and minimized when \(\theta=\pi\).

Level Curves and Gradients

Another important theorem (also a valuable result) connected to gradients is the following theorem:

Interpretations of the Theorem C:

Information from the figure:
- \(L\) is the level curve of \(f(x,y)\).
- An arbitrary point \(P(x_0, y_0)\) is on \(L\) and in the domain of \(f\).
- \(\mathbf u\) is the unit vector tangent to \(L\) at \(P\).
Implication from the figure:
- The value of \(f\) is the same at all points on the level curve \(L\). Therefore, the rate of change of \(f(x,y)\) in the the direction tangent to \(L\) is zero.
- When \(\mathbf u\) is tangent to \(L\), the rate of change of \(f(x,y)\) in the direction \(\mathbf u\), i.e., the directional derivative \(D_{\mathbf u}f(x_0, y_0)\), is zero. That is, \[D_{\mathbf u}f(x_0, y_0) = \nabla f(x_0, y_0) \cdot \mathbf u=0\]

N.B. The gradient vectors are perpendicular to the level curves and they point in the direction of greatest increase of \(z\).

Higher Dimensions

From level curves to level surfaces.

12.7 Tangent planes and Approximations

Tangent Planes

Consider a surface determined by equation \(z=f(x,y)\), and a curve passing through the point \((x_0,y_0,z_0)\) on this surface.

N.B. \(z=f(x,y)\) can be written as \(F(x,y,z)=f(x,y)-z=0\).

Then using the parametric equations, i.e., \(x=x(t)\), \(y=y(t)\), and \(z=z(t)\), for this curve,

Now we express this in terms of the gradient of \(F\) and the derivative of the vector expression for the curve \(\mathbf r(t)=x(t)\mathbf i+y(t)\mathbf j+z(t)\mathbf k\) as

\[ \nabla F\cdot\frac{d\mathbf r}{t}=0 \]implying that the gradient at \((x_0,y_0,z_0)\) is perpendicular to the tangent line at this point where \(\frac{d\mathbf r}{t}\) is tangent to the curve.

The argument above introduces the formal definition of the tangent plane.

Comment:

The definition in this section agrees with the definition of a tangent plane given in Section 12.4.

Differentials and Approximations

Differential

For function of two variables \(z=f(x,y)\), here are the facts extended from the single-variable calculus:

\(dx=\Delta x\) and \(dy=\Delta y\)
The differential \(dz=df(x,y)\) is an approximation to the change in \(z\), a.k.a. \(\Delta z\).

Comment:

While in the above illustration, \(dz\) does not appear to be a very good approximation to \(\Delta z\), you can expect that it will get better and better as \(\Delta x\) and \(\Delta y\) get smaller and smaller.

Taylor Polynomials for Functions of Two or More Variables

Recall the Taylor polynomials for functions of one variable:

The analogous extension for functions of two variables are

Taylor Polynomial of first order

which is the tangent plane at \((x_0,y_0,f(x_0,y_0))\).
Taylor Polynomial of second order

Comment:

These results generalize to nth-order Taylor polynomials and to functions of more than two variables.

12.8 Maxima and Minima

Comment:

A global maximum (or minimum) is automatically a local maximum (or minimum).

Where Do Extreme Values Occur?

Just analogous to the single-variable case, the answer is the critical points.

The critical points of \(f\) on \(S\) are of three types:

Boundary points.
Stationary points: \(\nabla f(\mathbf p_0)=\mathbf 0\). At such a point, the tangent plane is horizontal.
Singular points: We call \(\mathbf p_0\) a singular point if is an interior point of \(S\) where \(f\) is not differentiable, for example, a point where the graph of \(f\) has a sharp corner.

Comments:

For a function of two variables, \(f(x,y)\), the gradient at \((x_0,y_0)\) is \(\mathbf 0\), i.e., \(\nabla f(\mathbf p_0)=\mathbf 0\), means both partials are \(0\).
Specifically, the function \(g(x)=f(x,y_0)\) has an extreme value at \(x_0\) and the function\(h(y)=f(x_0,y)\) has an extreme value at \(y_0\). By the Critical Point Theorem for functions of one variable, that is, \[g^\prime(x_0)=f_x(x_0,y_0)=0 \text{ and }h^\prime(y_0)=f_y(x_0,y_0)=0\]

Saddle Point

N.B. \(\nabla f(x_0,y_0)=\mathbf 0\) does not guarantee that there is a local extremum at \((x_0,y_0)\). We need a criterion for deciding what is happening at a stationary point—our next topic.

Sufficient Conditions for Extrema

Analogous to the Second Derivative Test for functions of one variable, here comes Theorem C – Second Partial Test.

(proof is optional, shown on board if time allows)

Problems Involving the Boundary

Two Typical Cases with Corresponding Methods:

The entire boundary can be parameterized and then the methods of Chapter 4 can be used to find the maximum and minimum. e.g., Example 5
Pieces of the boundary can be parameterized and then the function can be maximized or minimized on each piece. e.g., Example 6

Comment: We will see another method, Lagrange multipliers, in the next section.

Second Partial Test:

Check the boundary points:

12.9 Lagrange Multipliers

Free Extremum Problem vs. Constrained Extremum Problem, for example:

To find the minimum value of \(x^2+2y^2+z^4+4\) is a free extremum problem.
To find the minimum of \(x^2+2y^2+z^4+4\) subject to the condition \(x+3y-z=7\) that is a constrained extremum problem.

We have seen the constrained extremum problem, e.g., Example 5 in the previous section. This problem was solved by finding a parametrization for the constraint and then maximizing a function of one variable.

However, chances are that the constraint equation is not easily solved for one of the variables or that the constraint cannot be parametrized in terms of one variable. Here is another way – the method of Lagrange multipliers.

Geometric Interpretation of the Method

Recall that in Example 5 of section 12.8, we are asked to maximize the objective function \(f(x,y)=2+x^2+y^2\) subject to the constraint \(g(x,y)=0\) where \(g(x,y)=x^2+\frac{1}{4}y^2-1\).

The key idea is behind the method of Lagrange multipliers in Figure 1:

the surface is the objective function, and the elliptical cylinder is the constraint,
the maximum and minimum will occur when a level curve of the objective function \(f\) is tangent to the constraint curve.

N.B. The maximum and minimum occur at point \(\mathbf p_0=(x_0,y_0)\) and \(\mathbf p_1=(x_1,y_1)\), where a level curve is tangent to the constraint curve.

Note that at any point of a level curve the gradient vector \(\nabla f\) is perpendicular to the level curve, and similarly, \(\nabla g\) is perpendicular to the constraint curve.

Therefore, \(\nabla f\) and \(\nabla g\) are parallel at \(\mathbf p_0\) and also at \(\mathbf p_1\), that is

\[ \nabla f(\mathbf p_0)=\lambda_0\nabla g(\mathbf p_0) \] and \[ \nabla f(\mathbf p_1)=\lambda_1\nabla g(\mathbf p_1) \]

for some nonzero numbers \(\lambda_0\) and \(\lambda_1\).

Applications

Two or More Constraints (Optional)

We solve the equations

Optimizing a Function over a Closed and Bounded Set

First, use the methods of Section 12.8 to find the maximum or minimum on the interior of \(S\).
Second, use Lagrange multipliers to find the points along the boundary that give a local maximum or minimum.
Finally, evaluate the function at these points to find the maximum and minimum over \(S\).

We start with finding all critical points on the interior of \(S\):