MCMC for DLM

Author

Jiaye Xu

Published

April 18, 2023

DLM and Kalman Methods

State Space Models and Dynamic Linear Models

Dynamic linear models (DLMs), often referred to as Gaussian linear state-space models (SSMs) in the time-series literature, offer a versatile framework for fitting several time-varying models. let \(Y_t\) be a \(m \times 1\) vector of observables at time \(t\). \(Y_t\) is related to a \(p \times 1\) vector, \(\boldsymbol{\theta}_t\), called the state vector, through a measurement equation. The general dynamic linear modeling framework is described as \[ \begin{aligned} \mathbf{Y}_t &= \mathbf{F}_t\mathbf{\theta}_t+\mathbf{v}_t \qquad\qquad\qquad\qquad\qquad v_t\sim\mathcal{N}_m(\mathbf{0},\mathbf{V}_t) \\ \mathbf{\theta}_t&=\mathbf{G}_t\mathbf{\theta}_{t-1} +\mathbf{w}_t \qquad\qquad\qquad\qquad\quad \mathbf{w}_t\sim\mathcal{N}_p(\mathbf{0},\mathbf{W}_t)\\ \mathbf{\theta}_0 &\sim \mathcal{N}_p(\mathbf{m_0}, \mathbf{C_0}) \end{aligned} \]

Recall the Results: Kalman Filter and Smoother for DLMs

Kalman Filter (Petris et al., 2009)

Kalman Smoother (Petris et al., 2009)

Simulation-Based Bayesian Inference

For a DLM including a possibly multidimensional unknown parameter \(\psi\) in its specification, and observations \(y_{1:T}\), the posterior distribution of the parameter and unobservable states is \[\pi(\theta_{0:T} , \psi\mid y_{1:T})\] which in general is impossible to compute this distribution in closed form. Therefore, we need numerical methods – This suggests that a sample from posterior distribution can be obtained from a Gibbs sampler alternating draws from \(\pi( \psi\mid\theta_{0:T} , y_{1:T})\) and \(\pi(\theta_{0:T} \mid \psi, y_{1:T})\).

Drawing the states given \(y_{1:T}\) :Forward Filtering Backward Sampling (FFBS)

In a Gibbs sampling from \(\pi(\theta_{0:T} , \psi\mid y_{1:T})\), one needs to simulate from the full conditional densities \(\pi( \psi\mid\theta_{0:T} , y_{1:T})\) and \(\pi(\theta_{0:T} \mid \psi, y_{1:T})\). While the first density is problem specific, the general expression of the latter density and efficient algorithms for sampling from it are available — the FFBS algorithm which is essentially a simulation version of the smoothing recursions.

We can write the joint distribution of \(\theta_{0:T}\) given \(y_{1:T}\) as

\[ \pi\left(\theta_{0: T} \mid y_{1: T}\right)=\prod_{t=0}^T \pi\left(\theta_t \mid \theta_{t+1: T}, y_{1: T}\right) \] where the last factor in the product is simply \(\pi\left(\theta_{T} \mid y_{1: T}\right)\), i.e., the filtering distribution of \(\theta_{T}\) , which is \(\mathcal N(m_T ,C_T )\). In order to obtain a draw from the distribution on the left-hand side, one can start by drawing \(\theta_T\) from a \(\mathcal N(m_T ,C_T )\) and then, for \(t = T −1, T −2, \ldots , 0\), recursively draw \(\theta_t\) from \(\pi\left(\theta_t \mid \theta_{t+1: T}, y_{1: T}\right)\). Assuming we have seen in the proof of Proposition of Kalman smoother that \(\pi\left(\theta_t \mid \theta_{t+1: T}, y_{1: T}\right)=\pi\left(\theta_t \mid \theta_{t+1}, y_{1: t}\right)\)t), and we showed that this distribution is \(\mathcal N(h_t ,H_t)\) \[h_t = m_t + C_tG^\prime_{t+1}R^{−1}_{t+1}(\theta_{t+1} − a_{t+1}),\\ H_t = C_t − C_tG^\prime_{t+1}R^{−1}_{t+1}G_{t+1}C_t.\]

Therefore, with several steps of derivation, the FFBS Algorithm is summarized in the algorithm:

  1. Run Kalman filter.

  2. Draw \(\theta_{T}\) from \(\mathcal N(m_T ,C_T )\).

  3. For \(t = T −1, T −2, \ldots , 0\), recursively draw \(\theta_t\) from \(\mathcal N(h_t ,H_t )\).

For a completely specified DLM, i.e., one not containing unknown parameters, draws from the posterior distribution of the states, and possibly from the forecast distribution of states and observations, can be obtained using the FFBS algorithm described above. In the more realistic situation of a DLM containing an unknown parameter vector \(\psi\), with prior distribution \(\pi(\psi)\), in the observation, system, or variance matrices, one typically uses MCMC to obtain posterior summaries of the distributions of interest.

Keep in mind that once a sample from the posterior distribution of the parameter is available, a sample from the joint posterior of states and parameter can be easily obtained in view of the decomposition of \(\pi( \psi\mid\theta_{0:T} , y_{1:T})\) and \(\pi(\theta_{0:T} \mid \psi, y_{1:T})\). Therefore, the Gibbs sampling approach, consisting in drawing in turn the states from their conditional distribution given the parameter and observations, and the parameter from its conditional distribution given the states and observations, is summarized in Algorithm: FFBS in Gibbs Sampler.

FFBS in Gibbs Sampler

\[ \begin{aligned} &\begin{aligned} & \text { 0. Initialize the starting point: set } \psi=\psi_0 \text {; } \\ & \text { 1. for } j=1, \ldots, N \text { : } \\ & \text { 1.1) generate } \theta_{1:T}^{(j)} \text { from } \pi(\theta_{0:T} \mid y_{1:T}, \psi=\psi^{j-1}) \text{ using FFBS}; \\ & \text { 1.2) generate } \psi^{(j)} \text { from } \pi(\psi\mid y_{1:T}, \theta_{0:T}= \theta_{0:T}^{j-1} ) \text {; } \\ \end{aligned}\\ &\text { Algorithm: FFBS in a Gibbs sampler } \end{aligned} \]

Example: FFBS in Gibbs for a Local-Level Model

A simple illustration of how a Gibbs sampler can be implemented in practice. The example that we are going to examine is the local level model with unknown observation and evolution variances.

A convenient, yet flexible, family of priors for each variance is the inverse-gamma family. More specifically, let \(\psi_1 = V^{−1}\) and \(\psi_2 = W^{−1}\), and assume that \(\psi_1\) and \(\psi_2\) are a priori independent, with \(\psi_i \sim \mathcal G(a_i, b_i)\) for \(i=1,2.\)

A Gibbs sampler for this model will iterate the following three steps: draw \(\theta_{0:T}\) , draw \(\psi_1\), and draw \(\psi_2\).

The random quantities generated in each of the three steps should be drawn from their full conditional distribution, i.e., the distribution of that quantity given all the other random variables in the model, including the observations. (Need several steps of derivation first and then implement in R code.)

The standard approach is based on the fact that the full conditional distribution is proportional to the joint distribution of all the random variables considered. For \(\psi_1\), this is

\[ \begin{aligned} & \pi\left(\psi_1 \mid \psi_2, \theta_{0: T}, y_{1: T}\right) \propto \pi\left(\psi_1, \psi_2, \theta_{0: T}, y_{1: T}\right) \\ & \quad \propto \pi\left(y_{1: T} \mid \theta_{0: T}, \psi_1, \psi_2\right) \pi\left(\theta_{0: T} \mid \psi_1, \psi_2\right) \pi\left(\psi_1, \psi_2\right) \\ & \quad \propto \prod_{t=1}^T \pi\left(y_t \mid \theta_t, \psi_1\right) \prod_{t=1}^T \pi\left(\theta_t \mid \theta_{t-1}, \psi_2\right) \pi\left(\psi_1\right) \pi\left(\psi_2\right) \\ & \quad \propto \pi\left(\psi_1\right) \psi_1^{T / 2} \exp \left(-\frac{\psi_1}{2} \sum_{t=1}^T\left(y_t-\theta_t\right)^2\right) \\ & \quad \propto \psi_1^{a_1+T / 2-1} \exp \left(-\psi_1 \cdot\left[b_1+\frac{1}{2} \sum_{t=1}^T\left(y_t-\theta_t\right)^2\right]\right) \end{aligned} \] From the previous equations we deduce that \(\psi_1\) and \(\psi_2\) are conditionally independent, given \(\theta_{0:T}\) and \(y_{1:T}\), and \[\pi\left(\psi_1 \mid \theta_{0: T}, y_{1: T}\right) \sim \mathcal{G}\left(a_1+\frac{T}{2},b_1+\frac{1}{2} \sum_{t=1}^T\left(y_t-\theta_t\right)^2\right)\]

A similar argument shows that \[\pi\left(\psi_2 \mid \theta_{0: T}, y_{1: T}\right) \sim \mathcal{G}\left(a_2+\frac{T}{2},b_2+\frac{1}{2} \sum_{t=1}^T\left(\theta_t-\theta_{t-1}\right)^2\right)\] The following code implements the Gibbs sampler discussed above for the Nile River data in dlm package.

a1 <- 2 
b1 <- 0.0001 
a2 <- 2 
b2 <- 0.0001 
## starting values 
psi1 <- 1
psi2 <- 1 
mod_level <- dlmModPoly(1, dV = 1 / psi1, dW = 1 / psi2) 
mc <- 1500 
psi1_save <- numeric(mc) 
psi2_save <- numeric(mc) 
n <- length(Nile) 
sh1 <- a1 + n / 2 
sh2 <- a2 + n / 2 
set.seed(10) 
for (it in 1 : mc) { 
  ## draw the states: FFBS 
  filt <- dlmFilter(Nile, mod_level) 
  level <- dlmBSample(filt) 
  ## draw observation precision psi1 
  rate <- b1 + crossprod(Nile - level[-1]) / 2 
  psi1 <- rgamma(1, shape = sh1, rate = rate) 
  ## draw system precision psi2 
  rate <- b2 + crossprod(level[-1] - level[-n]) / 2 
  psi2 <- rgamma(1, shape = sh2, rate = rate) 
  ## update and save 
  V(mod_level) <- 1 / psi1 
  W(mod_level) <- 1 / psi2 
  psi1_save[it] <- psi1 
  psi2_save[it] <- psi2 
  }

An Application of DLM – CAPM: from Static to Dynamic

Static CAPM

Let \(r_t, r^{(M)}_t\) and \(r^{(f)}_t\) be the returns at time \(t\) of the asset (or portfolio) under study, of the market and of a risk-free asset, respectively. Define the excess returns of the asset as \(y_t = r_t - r^{(f)}_t\) and the market’s excess returns as \(x_t = r^{(M)}_t - r^{(f)}_t\). Recall in a univariate standard static CAPM assumes that \[ y_{t}=\alpha+\beta x_{t}+v_{t}, \quad v_{t} \stackrel{i i d}{\sim} N\left(0, \sigma^{2}\right) \]

The parameter \(\beta\) measures the sensitivity of the asset excess returns to changes in the market. A value of \(\beta\) greater than one suggests that the asset tends to amplify changes in the market returns, and it is therefore considered an aggressive investment; while assets having \(\beta\) smaller than one are thought of as conservative investments.

Dynamic CAPM

Another choice, which is in fact quite interesting in many problems, is to think that the relationship between \(y\) and \(x\) evolves over time. That is, to consider a dynamic linear regression model described by a DLM with \(F_t =[1\ x_t]\) and \(\theta_t=[\beta_{1,t} \ \beta_{2,t}]\), \(V=\sigma^2\).

Multivariate Extension: SUTSE Model and CAPM

SUTSE Model

Seemingly unrelated time series equations (SUTSE) models are a class of models which specify the dependence structure among the state vectors \(\theta^{(1)}_t, \ldots, \theta^{(m)}_t\) as follows. The qualitative assumption that all series follow the same type of dynamics, and that the components of the state vectors have similar interpretations across the different DLMs.

For example, each series might be modeled using a linear growth model, so that for each of them the state vector has a level and a slope component and, although not strictly required, it is commonly assumed for simplicity that the variance matrix of the system errors is diagonal. (To keep the model simple, we retain the assumption that levels and slopes evolve in an uncorrelated way.) Therefore, the overall state vector \(\boldsymbol{\theta}_t = (\mu_{1,t}, \ldots, \mu_{m, t}, \beta_{1, t}, \ldots, \beta_{m, t})\). To be specific, let \(m =2\), then the system/evolution/state equation is \[ \left[\begin{array}{l} \mu_{1, t} \\ \mu_{2, t} \\ \beta_{1, t} \\ \beta_{2, t} \end{array}\right]=\left[\begin{array}{llll} 1 & 0 & 1 & 0 \\ 0 & 1 & 0 & 1 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \end{array}\right]\left[\begin{array}{l} \mu_{1, t-1} \\ \mu_{2, t-1} \\ \beta_{1, t-1} \\ \beta_{2, t-1} \end{array}\right]+\left[\begin{array}{l} w_{1, t} \\ w_{2, t} \\ w_{3, t} \\ w_{4, t} \end{array}\right], \] where \((w_{1, t}, w_{2, t}, w_{3, t}, w_{4, t})^\prime \sim \mathcal{N}(\mathbf{0}, \mathbf{W})\) and \[ \mathbf{W}=\left[\begin{array}{c|c} \mathbf{W_{\mu}} & \mathbf{0} \\ \hline \mathbf{0} & \mathbf{W_{\beta}} \\ \end{array}\right] \] The observation equation for the bivariate time series \(\{(Y_{1,t}, Y_{2,t}): t = 1, \ldots, T\}\) is \[ \left[\begin{array}{l} Y_{1, t} \\ Y_{2, t} \end{array}\right]=\left[\begin{array}{llll} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \end{array}\right] \boldsymbol{\theta}_{t}+\left[\begin{array}{l} v_{1, t} \\ v_{2, t} \end{array}\right] \] with \((v_{1, t}, v_{2, t})^\prime \sim \mathcal{N}(\mathbf{0}, \mathbf{V}_t)\). In order to introduce a further correlation between the series, the observation error variance V can be taken non-diagonal.

Multivariate Dynamic CAPM

As an example of the SUTSE applied to more general DLMs than the basic structural model, we present a seemingly unrelated regression (SUR) model. Let us consider a multivariate version of the dynamic capital asset pricing model (CAPM).

Now let us consider a multivariate dynamic CAPM. \(y_{i,t} = r_{i,t} - r^{(f)}_{i,t}\), for \(i = 1, \ldots, m\). For each asset, we define a regression DLM \[ \begin{aligned} y_{i, t} &=\alpha_{i, t}+\beta_{i, t} x_{t}+v_{i, t} \\ \alpha_{i, t} &=\alpha_{i, t-1}+w_{1 i, t} \\ \beta_{i, t} &=\beta_{i, t-1}+w_{2 i, t}, \end{aligned} \] and it is sensible to assume that the intercepts and the slopes are correlated across the \(m\) stocks. A seemingly unrelated regression model (SUR) is defined as \[ \begin{array}{ll} y_{t}=\left(F_{t} \otimes I_{m}\right) \theta_{t}+v_{t}, & v_{t} \stackrel{i i}{\sim} \mathcal{N}(0, V) \\ \theta_{t}=\left(G \otimes I_{m}\right) \theta_{t-1}+w_{t}, & w_{t} \stackrel{i i d}{\sim} \mathcal{N}(0, W) \end{array} \] with \[ y_{t}=\left[\begin{array}{c} y_{1, t} \\ \vdots \\ y_{m, t} \end{array}\right], \quad \theta_{t}=\left[\begin{array}{c} \alpha_{1, t} \\ \vdots \\ \alpha_{m, t} \\ \beta_{1, t} \\ \vdots \\ \beta_{m, t} \end{array}\right], \quad v_{t}=\left[\begin{array}{c} v_{1, t} \\ \vdots \\ v_{m, t} \end{array}\right], \quad w_{t}=\left[\begin{array}{c} w_{1, t} \\ \vdots \\ w_{2 m, t} \end{array}\right], \] \[ F_{t}=\left[\begin{array}{ll} 1 & x_{t} \end{array}\right], G=I_{2}, \text { and } W=\operatorname{blockdiag}\left(W_{\alpha}, W_{\beta}\right) \text {. } \] We assume for simplicity that the \(\alpha_{i,t}\) are time-invariant, which amounts to assuming that \(\mathbf{W}_\alpha = \mathbf{0}\). The correlation between the different excess returns is explained in terms of the non-diagonal variance matrices \({V}\) and \({W}_\beta\), assumed known or learnt from the data.

Data set: monthly returns from January 1978 to December 1987 of four stocks (Mobil, IBM, Weyer, and Citicorp), of the 30-day Treasury Bill as a proxy for the risk-free asset, and of the value-weighted average returns for all the stocks listed at the New York and American Stock Exchanges, representing the overall market returns.

# ==== Example: SUR and Dynamic CAPM ====
# link out of date
# tmp <- ts(read.table("http://shazam.econ.ubc.ca/intro/P.txt",header = TRUE), start = c(1978, 1), frequency = 12) * 100

# download from:
# https://github.com/rstudio/ai-blog/tree/master/_posts/2019-06-25-dynamic_linear_models_tfprobability/data/capm.txt
# df <- read_table(
#   "capm.txt",
#   col_types = list(X1 = col_date(format = "%Y.%m"))) %>%
#   rename(month = X1)
# df %>% glimpse()

tmp <- ts(read.table("capm.txt", header = TRUE), start = c(1978, 1), frequency = 12) * 100

y <- tmp[, 1 : 4] - tmp[, "RKFREE"] # excess returns of asset
colnames(y) <- colnames(tmp)[1 : 4]
market <- tmp[, "MARKET"] - tmp[, "RKFREE"] # excess return of market
rm("tmp") 
m <- NCOL(y) # 4 assets

### Set up the model
CAPM <- dlmModReg(market) # SUR model
CAPM$FF <- CAPM$FF %x% diag(m)
CAPM$GG <- CAPM$GG %x% diag(m)
CAPM$JFF <- CAPM$JFF %x% diag(m)
CAPM$W <- CAPM$W %x% matrix(0, m, m)
CAPM$W[-(1 : m), -(1 : m)] <- c(8.153e-07, -3.172e-05, -4.267e-05, -6.649e-05,
         -3.172e-05, 0.001377, 0.001852, 0.002884,
         -4.267e-05, 0.001852, 0.002498, 0.003884,
         -6.649e-05, 0.002884, 0.003884, 0.006057)
CAPM$V <- CAPM$V %x% matrix(0, m, m)
CAPM$V[] <- c(41.06, 0.01571, -0.9504, -2.328,
              0.01571, 24.23, 5.783, 3.376,
              -0.9504, 5.783, 39.2, 8.145,
              -2.328, 3.376, 8.145, 39.29) # assumed known
CAPM$m0 <- rep(0, 2 * m)
CAPM$C0 <- diag(1e7, nr = 2 * m)
### Smooth
CAPMsmooth <- dlmSmooth(y, CAPM)
 ### Plot: matrix of dlm-smoothed state vector components, beta in col (m+1):2m
plot(dropFirst(CAPMsmooth$s[, m + 1 : m]),
     lty = c("13", "6413", "431313", "B4"), plot.type = "s", xlab = "", ylab = "Beta")
abline(h = 1, col = "darkgrey")
legend("bottomright", legend = colnames(y), bty = "n", lty = c("13", "6413", "431313", "B4"), inset = 0.05)

# Interpretation: while Mobil’s beta remained essentially constant during the period under consideration, starting around 1980 the remaining three stocks became less and less conservative, with Weyer and Citicorp reaching the status of aggressive investments around 1984.