The Monte Carlo methods are basically a class of computational algorithms that rely on repeated random sampling to obtain certain numerical results, and can be used to solve problems that have a probabilistic interpretation. This method of simulation enables us to sample from a known distribution so that we can compare our simulated results with our theory. With simulated data, we can always test to see if what we’re expecting is correct if we fix a sample size, dimension, and distribution beforehand.
Advantages of using MC:
Monte Carlo simulation can help with the following:
Probabilistic Results – show scenarios and how the occurrence likelihood.
Graphical Results – The outcomes and their chance of occurring can be easily converted to graphs making it easy to communicate findings to an audience.
Sensitivity Analysis – Easier to see which variables impact the outcome the most, i.e. which variables had the biggest effect on bottom-line results.
Scenario Analysis: Using Monte Carlo simulation, we can see exactly which inputs had which values together when certain outcomes occurred.
Ordinary Monte Carlo
The “Monte Carlo method” refers to the theory and practice of learning about probability distributions by simulation rather than calculus. In ordinary Monte Carlo (OMC) we use IID simulations from the distribution of interest. Suppose \(X_1, X_2, \dots\) are IID simulations from some distribution, and suppose we want to know an expectation \[
\theta = E\left[ Y_1 \right] = E\left[ g(X_1) \right].
\] The law of large numbers (LLN) then says \[
\bar{y}_n = \frac{1}{n} \sum_{i=1}^n Y_i = \frac{1}{n} \sum_{i=1}^n g(X_i)
\] converges to \(\theta\) with probability \(1\) (SLLN).
Hence we can approximate \(\theta\) by generating a large number of random numbers \(X_i\) and taking as our approximation the average value of \(g(X_i)\). This approach to approximating integrals is called the Monte Carlo approach. —Sheldon Ross, Simulation
How does it work?
Monte Carlo simulation works by selecting a random value for each task, and then building models based on those values. This process is then repeated many times, with different values so in the end, the output is a distribution of outcomes.
Below we have two common examples, CLT and LLN, that utilizes this Monte Carlo simulation method.
Demonstrate CLT and LLN via MC
Ordinary Monte Carlo Theory - CLT
The central limit theorem (CLT) says \[
\frac{\sqrt{n} (\bar{y}_n - \theta) }{\sigma} \stackrel{d}{\rightarrow} \mbox{N}(0,1) .
\] That is, for sufficiently large \(n\), \[
\bar{y}_n \sim \mbox{N}( \theta , \sigma^2 / n).
\] The CLT is a way to approximate the probability of the sample average is close to the mean. When a random sample of size \(n\) is taken from any distribution with mean \(\theta\) and variance \(\sigma^2\), the sample mean will have a distribution approximately Normal with with mean \(\theta\) and variance \(\sigma^2/n\), It doesn’t matter what the shape of the underlying distribution is, all that is necessary is finite variance and many, many repeated samples of size \(n\) from a population. Then, the average or sum will be approximately Normal distributed. Alternatively, the sum will be approximately Normal with mean \(n\theta\) and variance \(n\sigma^2\).
Comments:
The theory of OMC is just the theory of frequentist statistical inference. The only differences are that
the “data” \(X_1, \dots, X_n\) are computer simulations rather than measurements on objects in the real world
the “sample size” \(n\) is the number of computer simulations rather than the size of some real world data
the unknown parameter \(\theta\) is in principle completely known, given by some integral, which we are unable to do.
Everything works just the same when the data \(X_1, X_2, \dots\), which are computer simulations are vectors. But the functions of interest \(g(X_1), g(X_2), \dots\) are scalars.
OMC works great, but it can be very difficult to simulate IID simulations of random variables or random vectors whose distribution is not brand name distributions.
Demonstration of CLT
The CLT can be demonstrated through simulation. Below demonstrates the CLT theorem using Poisson distribution with sample size 1000. Another example below is Exponential distribution with sample size 10,000.
# Central Limit Theorem Simulation# n: sample size of each sample# dist: underlying distribution where the sample is drawnsimulation <-function(n, dist=NULL, param1=NULL, param2=NULL) {r <-10000# Number of replications/samples - (DO NOT ADJUST)# This produces a matrix of observations with# n columns and r rows. Each row is one sample:my.samples <-switch(dist,"E"=matrix(rexp(n*r,param1),r),"N"=matrix(rnorm(n*r,param1,param2),r),"U"=matrix(runif(n*r,param1,param2),r),"P"=matrix(rpois(n*r,param1),r),"C"=matrix(rcauchy(n*r,param1,param2),r),"B"=matrix(rbinom(n*r,param1,param2),r),"G"=matrix(rgamma(n*r,param1,param2),r),"X"=matrix(rchisq(n*r,param1),r),"T"=matrix(rt(n*r,param1),r))all.sample.sums <-apply(my.samples,1,sum)all.sample.means <-apply(my.samples,1,mean)all.sample.vars <-apply(my.samples,1,var)par(mfrow=c(2,2))hist(my.samples[1,], col="gray", main="Distribution of One Sample", xlab="")hist(all.sample.sums, col="gray", main="Sampling Distribution of the Sum", xlab="")hist(all.sample.means,col="gray", main="Sampling Distribution of the Mean", xlab="")hist(all.sample.vars,col="gray", main="Sampling Distribution of the Variance", xlab="")}simulation(n=1000, dist="P", param1=1)
simulation(n=1000, dist="E", param1=1)
As from these example plots with varying sample sizes and distributions, the data’s sample mean still has approximately a Normal distribution.
Demonstration of LLN
The Law of Large Numbers (LLN) is a way to explain how the average of a large sample of independently and identically distributed (i.i.d.) random variables will be close to their mean.
An example of a simulation is below:
set.seed(1212)n =50000x =sample(0:1, n, replace =TRUE)s =cumsum(x)r = s/(1:n)plot(r, ylim =c(0.4, 0.6), type ="l")lines(c(0,n), c(0.5, 0.5), col="red")
We’ll start with basic integration. Suppose we have an instance of a Normal distribution with a mean of 1 and a standard deviation of 10. Then we want to find the integral from 3 to 6 \(\int_{x=3}^6\mathcal N(x;1,10)\) as visualized below
We can simply write a simulation that samples from this distribution 100,000 times and see how many values are between 3 and 6.
The expectation (or integral) \(\theta\) is intractable, we don’t know how to compute it analytically — MC Integration
n <-1000x <-rgamma(n, 3/2, scale=1)mean(x)
[1] 1.450182
y <-1/((x+1)*log(x+3))est <-mean(y)est
[1] 0.3651932
mcse <-sd(y) /sqrt(length(y))interval <- est +c(-1,1)*1.96*mcseinterval
[1] 0.3531551 0.3772312
Example: Approximating the Binomial
Suppose we flip a coin 10 times and we want to know the probability of getting more than 3 heads. This is trivial for the Binomial distribution, which we’ll ignore. But suppose we have forgotten about this or never learned it in the first place. We can easily solve this problem with a Monte Carlo Simulation. We’ll use the common trick of representing tails with 0 and heads with 1, then simulate 10 coin tosses 100,000 times and see how often that happens.
If we draw a square containing that circle its area will be \(4r^2\)
So the ratio of the area of the circle to the area of the square is
\[\frac{\pi r^2}{4r^2} = \frac{\pi}{4}\] - Given this fact, we can empirically determine the ratio of the area of the circle to the area of the square we can simply multiply this number by 4 and we’ll get our approximation of \(\pi\). - How?
Approximating \(\pi\)
Randomly sample \(x\) and \(y\) values on the unit square centered at 0
If \(x^2 + y^2 \le .5^2\) then the point is in the circle
The ratio of points in the circle multiplied by 4 is our estimate of \(\pi\)
plot(xs,ys,pch='.',col=ifelse(in.circle,"blue","grey") ,xlab='',ylab='',asp=1,main=paste("MC Approximation of Pi =",mc.pi))
Predicting the Stock Market
Company B has IPO’d! It trades under the ticker symbol B. On average it gains \(1.001\) times its opening price during the trading day, but that can vary by a standard deviation of \(0.005\) on any given day (this is its volatility).
We can simulate a single sample path for B by taking the cumulative product from a Normal distribution with a mean of \(1.001\) and a sd of \(0.005\). Assuming B opens at \(\$20\) per share here is a sample path for 200 days of B trading.
days <-200changes <-rnorm(200,mean=1.001,sd=0.005)plot(cumprod(c(20,changes)),type='l',ylab="Price",xlab="day",main="Company B closing price (sample path)")
But this is just one possible future! If you are thinking of investing in B you want to know what are the possible closing prices of the stock at the end of 200. To assess risk in this stock we need to know what are reasonable upper and lower bounds on the future price.
To solve this we’ll just simulate 100,000 different possible paths the stock could take and then look at the distribution of closing prices.
runs <-100000#simulates future movements and returns the closing price on day 200generate.path <-function(){ days <-200 changes <-rnorm(200,mean=1.001,sd=0.005) sample.path <-cumprod(c(20,changes)) closing.price <- sample.path[days+1] #+1 because we add the opening pricereturn(closing.price)}mc.closing <-replicate(runs,generate.path())median(mc.closing)
[1] 24.37187
The median price of B at the end of 200 days is simply median(mc.closing) but we can also see the upper and lower 95th percentiles
quantile(mc.closing,0.95)
95%
27.37767
quantile(mc.closing,0.05)
5%
21.71864
Comments - This is a toy model of stock market movements. Real world quantitative finance makes heavy use of Monte Carlo simulations.