# Step 1
<- seq(-10, 10, by = 1) # vector of length = 21
x <- seq(-10, 10, by = 1)
y
# Step 2
<- function(x,y){return(y^2 -x^2)} # Hyperbolic Paraboloid
f <- outer(x,y,f) # a matrix: 21 by 21, where z_{ij} is the function value of (x_i, y_j)
z
# Step 3
persp(x, y, z)
Chapter 12 Derivatives for Multivariable Functions
12.1 Functions of Several Variables
Functions of Two Variables
A real-valued function of two real variables
Graphs
The graph of a function \(f\) of two variables, that is, the graph of the equation \(z=f(x,y)\), is normally a surface. A typical example is in the following figure
Comment: Each \((x, y)\) in the domain corresponds one value \(z\), hence each line perpendicular to the \(xy\)-plane intersects the surface in at most one point.
Computer Graphs – 3D Surface Plot
A number of software packages, including Maple and Mathematica, can produce complicated three-dimensional graphs with ease.
Static graphs, the examples in textbook.
Interactive graphs, created by software, e.g.,
R
, tons of learning resources out there.
3D Surface Plot using R (Optional)
Three-Step Procedure:
Define \(x\) and \(y\);
Define function \(z=f(x,y)\);
Plot.
Customize your Plot in R (Optional)
In function persp
, theta
and phi
are angles defining the viewing direction. theta
gives the azimuthal direction and phi
the colatitude (the complement of the latitude). That is, the argument theta
means moving along “longitudes” of the sphere, while the argument phi
means moving along “latitudes” of the sphere.
persp(x, y, z, theta=50) # clockwise
persp(x, y, z, theta=-50) # counter-clockwise
persp(x, y, z, phi=30) # "higer-latitude"
persp(x, y, z, phi= -30) # "lower-latitude"
persp(x, y, z, theta=-50, phi = 30, col = "lightblue")
persp(x, y, z, theta=-50, phi = 30, col = "lightblue", box=FALSE) # remove the box
persp(x, y, z, theta=-50, phi = 30, col = "lightblue", ticktype="detailed") # add ticks to the axes
Level Curves and Contour Map
Map makers have given us another and usually simpler way to picture a surface: the contour map.
Each horizontal plane intersects the surface in a curve.The projection of this curve on the \(xy\)-plane is called a level curve, and a collection of such curves is a contour plot or a contour map.
Contour Plot using R (Optional)
contour(x, y, z, col="blue")
Functions of Three Variables
A number of quantities depend on three or more variables. For example, the temperature in a large auditorium may depend on the location \((x, y, z)\); this leads to the function \(T(x, y, z)\).
We can visualize functions of three variables by plotting level surfaces.
For example, The level surfaces of \(f(x, y, z) = 2x^2 + y^2 + z^2\) are concentric spheres.
12.2 Partial Derivatives
Partial Derivatives of a Function of Two Variables
Suppose \(f(x,y)\) is a function of two variables \(x\) and \(y\).
If \(y\) is held constant, that is \(y=y_0\), then \(f(x,y_0)\) is a function of the single variable \(x\). Its derivative at \(x=x_0\) is called the partial derivative of \(f\) with respect to \(x\) at \((x_0,y_0)\), denoted by \(f_x(x_0,y_0)\).
If \(x\) is held constant, that is \(x=x_0\), then \(f(x_0,y)\) is a function of the single variable \(y\). Its derivative at \(y=y_0\) is called the partial derivative of \(f\) with respect to \(y\) at \((x_0,y_0)\), denoted by \(f_y(x_0,y_0)\).
N.B. The rules for differentiating a function of one variable in Chapter 3 work for finding partial derivatives, as long as we hold one variable fixed.
Notations for partial derivatives
Be aware that
Geometric and Physical Interpretations
Consider the surface whose equation is \(z=f(x,y)\).
- The plane \(y=y_0\) intersects this surface in the plane curve \(QPR\), and the value of \(f_x(x_0,y_0)\) is the slope of the tangent line to this curve at \(P(x_0,y_0, f(x_0,y_0))\).
- The plane \(x=x_0\) intersects the surface in the plane curve \(LPM\), and \(f_y(x_0,y_0)\) is the slope of the tangent line to this curve at \(P\).
Physical Interpretation
Partial derivatives may also be interpreted as (instantaneous) rates of change.
Example of a Violin String:
Suppose that a violin string is fixed at points \(A\) and \(B\) and vibrates in the \(xz\)-plane.
Let \(z=f(x,t)\) denote the height of the string at the point \(P\) with \(x\)-coordinate \(x\) at time \(t\), then,
\(\partial z/\partial x\) is the slope of the string at \(P\).
\(\partial z/\partial t\) is the time rate of change of height of \(P\) along the indicated vertical line. In other words, \(\partial z/\partial t\) is the vertical velocity of \(P\).
Higher Partial Derivatives
Second partial derivatives of \(f(x,y)\)
N.B. \(f_{xy}=f_{yx}\) is usually the case for the functions of two variables. A criterion for this equality will be given in Section 12.3 (Theorem C).
Partial derivatives of the third and even higher orders are defined analogously, for example
Functions of More Than Two Variables
If \(f\) is a function of three variables, \(x\), \(y\), and \(z\).
Partial derivatives, such as \(f_{xy}\) and \(f_{xyz}\) that involve differentiation with respect to more than one variable are called mixed partial derivatives.
12.3 Limit and Continuity
The intuitive meaning of the limit statement
The values of \(f(x,y)\) get closer and closer to the number \(L\) as \((x, y)\) approaches \((a, b)\).
N.B. \((x, y)\) can approach \((a, b)\) in infinitely many ways.
To interpret this definition, we write \[\|(x,y)-(a,b)\|=\sqrt{(x-a)^2+(y-b)^2}\]
and then \(\{(x,y): 0<\|(x,y)-(a,b)\|<\delta\}\) represents the points inside a disk, a circle of radius \(\delta\) except the center \((a,b)\).
Comments:
Limits by Substitution
Before we state a theorem that justifies evaluating limits by substitution, we give a few definitions:
- A polynomial in the variables \(x\) and \(y\)
- A rational function in the variables \(x\) and \(y\)
where \(p\) and \(q\) are polynomials in \(x\) and \(y\), assuming \(q\) is not identically zero.
A Problem-Solving Trick: Polar Coordinate
It is often easier to analyze limits of functions of two variables, especially limits at the origin, by changing to polar coordinates. Thus, limits for functions of two variables can sometimes be expressed as limits involving just one variable, \(r\).
N.B. The key point is that \((x, y)\to (0,0)\) if and only if \(r=\sqrt{x^2+y^2}\to 0\).
Continuity at a Point
To state that \(f(x,y)\) is continuous at the point \((a,b)\), we need the following:
\(f(x,y)\) has a value at \((a,b)\);
\(f(x,y)\) has a limit at \((a,b)\);
\(f(x,y)\) is equal to the limit at \((a,b)\)
In summary, that is
Continuity of Functions
Polynomial functions are continuous for all (x, y).
Rational functions are continuous everywhere except where the denominator is equal to 0.
Sums, differences, products, and quotients of continuous functions are continuous (provided, in the latter case that we avoid division by \(0\)).
Composition of functions.
Continuity on a Set
A neighborhood of radius \(\delta\) of a point \(P\) is the set of all points \(Q\) satisfying \[\|Q-P\|<\delta\]
In two-space, a neighborhood is the “inside” of a circle; in three-space, it is the inside of a sphere.
Definitions
A point \(P\) is an interior point of a set \(S\) if there is a neighborhood of \(P\) contained in \(S\).
- The set of all interior points of \(S\) is the interior of \(S\).
\(P\) is a boundary point of \(S\) if every neighborhood of \(P\) contains points that are in \(S\) and points that are not in \(S\).
- The set of all boundary points of \(S\) is called the boundary of \(S\).
A set is open if all its points are interior points, and it is closed if it contains all its boundary points.
A set \(S\) is bounded if there exists an \(R>0\) such that all ordered pairs in \(S\) are inside a circle of radius \(R\) centered at the origin.
Interpretations of the Definition:
A point \((x_0, y_0)\) in a region (or set) \(R\) in the \(xy\)-plane is an interior point of \(R\) if it is the center of a disk of positive radius that lies entirely in \(R\).
A point \((x_0, y_0)\) is a boundary point of \(R\) if every disk centered at \((x_0, y_0)\) contains points that lie outside of \(R\) as well as points that lie in \(R\). (The boundary point itself need not belong to \(R\).)
A region (or set) in the plane is bounded if it lies inside a disk of finite radius. A region is unbounded if it is not bounded.
If \(S\) is an open set, to say that \(f\) is continuous on \(S\) means that \(f\) is continuous at every point of \(S\).
N.B. If \(S\) contains some or all of its boundary points, we must be careful to give the right interpretation of continuity at such points.
To say that \(f\) is continuous at a boundary point \(P\) of \(S\) means that \(f(Q)\) must approach \(f(P)\) as \(Q\) approaches \(P\) through points of \(S\).
N.B. The order of differentiation in mixed partial derivatives doesn’t matter.
12.4 Differentiability
Heuristic Questions:
Q: Is a function of two variable differentiable at a certain point?
A: It requires the existence of a tangent plane – more than the mere existence of the partial derivatives of \(f\).
For example,
N.B. A tangent plane ought to approximate the graph very well in all directions.
Local Linearity
N.B. Local linearity is another way to look at differentiability.
Differentiability of a single-variable function:
If \(f\) is differentiable at \(a\), then there exists a tangent line through \((a,f(a))\) that approximates the function for values of \(x\) near \(a\). In other words, \(f\) is almost linear near \(a\).
Precisely, we say that a function \(f\) is locally linear at \(a\) if there is a constant \(m\) such that \[f(a+h)=f(a)+hm+h\epsilon(h)\] where \(\epsilon(h)\) is a function satisfying \(\lim_{h\to 0}\epsilon (h)=0\).
Interpretation of the local linearity:
Note that the function \(\epsilon(h)\) is the difference between the slope of the secant line through the points \((a,f(a))\) and \((a+h, f(a+h))\) and the slope of the tangent line through \((a,f(a))\).
If function \(f\) is locally linear at \(a\), then
We conclude that \(f\) must be differentiable at \(a\) and that \(m\) must equal \(f^\prime(a)\).
The concept of local linearity carries over to the situation of the function of two variables:
If we zoom in far enough, the surface resembles a plane, and the contour plot appears to consist of parallel lines.
Using the vector notation (for the moment we downplaying the distinction between the point and vector, that is, \(\mathbf p=(x,y)=<x,y>\)), we define \(\mathbf p_0=(a,b)\), \(\mathbf h=(h_1,h_2)\), and \(\epsilon (\mathbf h)=(\epsilon_1(h_1,h_2),\epsilon_2(h_1,h_2))\).
Here the function \(\epsilon (\mathbf h)\) is a vector-valued function of a vector variable.
Therefore, the formula in the definition of local linearity can be written using the vector notation
Comment: This formulation easily carries over to the case where \(f\) is a function of three (or more) variables.
N.B. Differentiability is synonymous with local linearity.
Differentiability
Comments: The gradient becomes the analog of the derivative.
A condition of the differentiability at a point
The following theorem gives a condition that guarantees the differentiability of a function at a point:
(proof is optional)
The key steps of proof:
By the Mean Value Theorem for Derivatives,
and
Tangent Plane
Rules for Gradients
Just like \(D\) for derivatives, \(\nabla\) for gradients is also a linear operator.
Continuity vs. Differentiability
Recall that for functions of one variable, differentiability implies continuity, but not vice versa. The same is true here.
12.6 Chain Rules
Our goal in this section is to generalize the chain rule for the single-value function to the versions for the multivariable function.
First Version
(Proof is optional.)
Key steps of the proof:
Using the definition of local linearity,
Comment:
We could have done this example using the direct substitution instead of the chain rule. However, the direct substitution method is often not available or not convenient.
Comment: The chain rule in Theorem A can easily extend to a function of three variables.
Second Version
Implicit Functions
Let’s think of this situation of implicit function:
Suppose that \(F(x,y)=0\) defines \(y\) implicitly as a function of \(x\), for example, \(y=g(x)\), but that the function \(g\) is difficult or impossible to determine.
If our goal is to find \(dy/dx\), one method for doing this that we learnt in chapter 3 is the implicit differentiation. Here is another method using the Chain Rule:
Furthermore, consider this situation:
If \(z\) is an implicit function of \(x\) and \(y\) defined by the equation \(F(x,y,z)=0\). Then,
- differentiation of both sides with respect to \(x\), holding \(y\) fixed,
\[\frac{\partial F}{\partial x}\frac{\partial x}{\partial x}+\frac{\partial F}{\partial y}\frac{\partial y}{\partial x}+\frac{\partial F}{\partial z}\frac{\partial z}{\partial x}=0\]
- differentiation of both sides with respect to \(y\), holding \(x\) fixed,
\[\frac{\partial F}{\partial x}\frac{\partial x}{\partial y}+\frac{\partial F}{\partial y}\frac{\partial y}{\partial y}+\frac{\partial F}{\partial z}\frac{\partial z}{\partial y}=0\]
Note that \(\frac{\partial y}{\partial x}=\frac{\partial x}{\partial y}=0\), with some simplification, we have the following formula
12.5 Directional Derivatives and Gradients
Partial Derivative vs. Directional Derivatives:
Partial derivatives: measure the rate of change (and the slope of the tangent line) in directions parallel to the \(x\)- and \(y\)-axes.
Directional derivatives: measure the rate of change in an arbitrary direction.
In vectors notations:
Partial derivatives:
where \(\mathbf i\) and \(\mathbf j\) are the unit vectors in the positive \(x\)- and \(y\)-axes.
Directional derivatives: we replace \(\mathbf i\) and \(\mathbf j\) by an arbitrary unit vector \(\mathbf u\) and give the following definition,
Comments:
\(D_{\mathbf i}f(\mathbf p)=f_x(\mathbf p)\) and \(D_{\mathbf j}f(\mathbf p)=f_y(\mathbf p)\).
We also use the notation \(D_{\mathbf u}f(x,y)\), since \(\mathbf p=(x,y)\).
Geometric Interpretations of \(D_{\mathbf u}f(x_0,y_0)\):
The vector \(\mathbf u\) determines a line \(L\) in the \(xy\)-plane through \((x_0,y_0)\).
The plane through \(L\) perpendicular to the \(xy\)-plane intersects the surface \(z=f(x,y)\) in a curve \(C\).
\(D_{\mathbf u}f(x_0,y_0)\) is the slope of the tangent line to \(C\) at the point \((x_0,y_0,f(x_0,y_0))\), and it measures the rate of change of \(f\) with respect to distance in the direction \(\mathbf u\).
Connection with the Gradient
Recall the definition formula of the gradient
\[ \nabla f(\mathbf p)=f_x(\mathbf p)\mathbf i+f_y(\mathbf p)\mathbf j \]
For functions of three or more variables, just some obvious modifications.
Maximum Rate of Change
Question:
In what direction the function is changing most rapidly, that is, in what direction is the \(D_{\mathbf u}f(\mathbf p)\) largest?
A quick answer: In the direction of the gradient.
To understand why this theorem is true:
Using dot product formula to write out \(D_{\mathbf u}f(\mathbf p)\), we obtain
where \(\theta\) is the angle between \(\mathbf u\) and \(\nabla f(\mathbf p)\).
Note that \(D_{\mathbf u} f(\mathbf p)\) is maximized when \(\theta=0\) and minimized when \(\theta=\pi\).
Level Curves and Gradients
Another important theorem (also a valuable result) connected to gradients is the following theorem:
Interpretations of the Theorem C:
Information from the figure:
\(L\) is the level curve of \(f(x,y)\).
An arbitrary point \(P(x_0, y_0)\) is on \(L\) and in the domain of \(f\).
\(\mathbf u\) is the unit vector tangent to \(L\) at \(P\).
Implication from the figure:
The value of \(f\) is the same at all points on the level curve \(L\). Therefore, the rate of change of \(f(x,y)\) in the the direction tangent to \(L\) is zero.
When \(\mathbf u\) is tangent to \(L\), the rate of change of \(f(x,y)\) in the direction \(\mathbf u\), i.e., the directional derivative \(D_{\mathbf u}f(x_0, y_0)\), is zero. That is, \[D_{\mathbf u}f(x_0, y_0) = \nabla f(x_0, y_0) \cdot \mathbf u=0\]
N.B. The gradient vectors are perpendicular to the level curves and they point in the direction of greatest increase of \(z\).
Higher Dimensions
From level curves to level surfaces.
12.7 Tangent planes and Approximations
Tangent Planes
Consider a surface determined by equation \(z=f(x,y)\), and a curve passing through the point \((x_0,y_0,z_0)\) on this surface.
N.B. \(z=f(x,y)\) can be written as \(F(x,y,z)=f(x,y)-z=0\).
Then using the parametric equations, i.e., \(x=x(t)\), \(y=y(t)\), and \(z=z(t)\), for this curve,
Now we express this in terms of the gradient of \(F\) and the derivative of the vector expression for the curve \(\mathbf r(t)=x(t)\mathbf i+y(t)\mathbf j+z(t)\mathbf k\) as
\[ \nabla F\cdot\frac{d\mathbf r}{t}=0 \]implying that the gradient at \((x_0,y_0,z_0)\) is perpendicular to the tangent line at this point where \(\frac{d\mathbf r}{t}\) is tangent to the curve.
The argument above introduces the formal definition of the tangent plane.
Comment:
The definition in this section agrees with the definition of a tangent plane given in Section 12.4.
Differentials and Approximations
Differential
For function of two variables \(z=f(x,y)\), here are the facts extended from the single-variable calculus:
\(dx=\Delta x\) and \(dy=\Delta y\)
The differential \(dz=df(x,y)\) is an approximation to the change in \(z\), a.k.a. \(\Delta z\).
Comment:
While in the above illustration, \(dz\) does not appear to be a very good approximation to \(\Delta z\), you can expect that it will get better and better as \(\Delta x\) and \(\Delta y\) get smaller and smaller.
Taylor Polynomials for Functions of Two or More Variables
Recall the Taylor polynomials for functions of one variable:
The analogous extension for functions of two variables are
Taylor Polynomial of first order
which is the tangent plane at \((x_0,y_0,f(x_0,y_0))\).
Taylor Polynomial of second order
Comment:
These results generalize to nth-order Taylor polynomials and to functions of more than two variables.
12.8 Maxima and Minima
Comment:
A global maximum (or minimum) is automatically a local maximum (or minimum).
Where Do Extreme Values Occur?
Just analogous to the single-variable case, the answer is the critical points.
The critical points of \(f\) on \(S\) are of three types:
Boundary points.
Stationary points: \(\nabla f(\mathbf p_0)=\mathbf 0\). At such a point, the tangent plane is horizontal.
Singular points: We call \(\mathbf p_0\) a singular point if is an interior point of \(S\) where \(f\) is not differentiable, for example, a point where the graph of \(f\) has a sharp corner.
Comments:
For a function of two variables, \(f(x,y)\), the gradient at \((x_0,y_0)\) is \(\mathbf 0\), i.e., \(\nabla f(\mathbf p_0)=\mathbf 0\), means both partials are \(0\).
Specifically, the function \(g(x)=f(x,y_0)\) has an extreme value at \(x_0\) and the function\(h(y)=f(x_0,y)\) has an extreme value at \(y_0\). By the Critical Point Theorem for functions of one variable, that is, \[g^\prime(x_0)=f_x(x_0,y_0)=0 \text{ and }h^\prime(y_0)=f_y(x_0,y_0)=0\]
Saddle Point
N.B. \(\nabla f(x_0,y_0)=\mathbf 0\) does not guarantee that there is a local extremum at \((x_0,y_0)\). We need a criterion for deciding what is happening at a stationary point—our next topic.
Sufficient Conditions for Extrema
Analogous to the Second Derivative Test for functions of one variable, here comes Theorem C – Second Partial Test.
(proof is optional, shown on board if time allows)
Problems Involving the Boundary
Two Typical Cases with Corresponding Methods:
The entire boundary can be parameterized and then the methods of Chapter 4 can be used to find the maximum and minimum. e.g., Example 5
Pieces of the boundary can be parameterized and then the function can be maximized or minimized on each piece. e.g., Example 6
Comment: We will see another method, Lagrange multipliers, in the next section.
- Second Partial Test:
- Check the boundary points:
12.9 Lagrange Multipliers
Free Extremum Problem vs. Constrained Extremum Problem, for example:
To find the minimum value of \(x^2+2y^2+z^4+4\) is a free extremum problem.
To find the minimum of \(x^2+2y^2+z^4+4\) subject to the condition \(x+3y-z=7\) that is a constrained extremum problem.
We have seen the constrained extremum problem, e.g., Example 5 in the previous section. This problem was solved by finding a parametrization for the constraint and then maximizing a function of one variable.
However, chances are that the constraint equation is not easily solved for one of the variables or that the constraint cannot be parametrized in terms of one variable. Here is another way – the method of Lagrange multipliers.
Geometric Interpretation of the Method
Recall that in Example 5 of section 12.8, we are asked to maximize the objective function \(f(x,y)=2+x^2+y^2\) subject to the constraint \(g(x,y)=0\) where \(g(x,y)=x^2+\frac{1}{4}y^2-1\).
The key idea is behind the method of Lagrange multipliers in Figure 1:
the surface is the objective function, and the elliptical cylinder is the constraint,
the maximum and minimum will occur when a level curve of the objective function \(f\) is tangent to the constraint curve.
N.B. The maximum and minimum occur at point \(\mathbf p_0=(x_0,y_0)\) and \(\mathbf p_1=(x_1,y_1)\), where a level curve is tangent to the constraint curve.
Note that at any point of a level curve the gradient vector \(\nabla f\) is perpendicular to the level curve, and similarly, \(\nabla g\) is perpendicular to the constraint curve.
Therefore, \(\nabla f\) and \(\nabla g\) are parallel at \(\mathbf p_0\) and also at \(\mathbf p_1\), that is
\[ \nabla f(\mathbf p_0)=\lambda_0\nabla g(\mathbf p_0) \] and \[ \nabla f(\mathbf p_1)=\lambda_1\nabla g(\mathbf p_1) \]
for some nonzero numbers \(\lambda_0\) and \(\lambda_1\).
Applications
Two or More Constraints (Optional)
We solve the equations
Optimizing a Function over a Closed and Bounded Set
First, use the methods of Section 12.8 to find the maximum or minimum on the interior of \(S\).
Second, use Lagrange multipliers to find the points along the boundary that give a local maximum or minimum.
Finally, evaluate the function at these points to find the maximum and minimum over \(S\).
We start with finding all critical points on the interior of \(S\):