3 min read

A Markov experiment

Introduction

Suppose you conduct an experiment of the following form. You start out with an \(X_0\), then do a binary experiment \(Y_1\) with conditional probability \(P(Y_1\mid X_0)=f_\theta(X_0)\). Then you choose your next value of \(X\) deterministically, based solely on the value of \(X_0\) and \(Y_1\) using a known \(g(x,y)\), i.e., \(X_1\mid Y_0,X_1 = g(X_1,Y_0)\). Doing this \(n\) times, you will get a sequence of values \(X_0,X_1,X_2,\ldots,X_n\) and \(Y_1,Y_2,\ldots,Y_n\).

Our goal is to do inference on \(\theta\). We make the following observations. 1. If \(g\) is invertible in \(y\) for each \(x\), the sequence \({Y_i}\) is derivable from \({X_i}\). 2. The sequence \({X_i}\) is a Markov chain with transition probabilities \[ P(X_{i}=x\mid X_{i-1})=\begin{cases} f_{\theta}(X_{i-1}) & x=g(X_{i-1},1),\\ 1-f_{\theta}(X_{i-1}) & x=g(X_{i-1},0),\\ 0, & \text{otherwise.} \end{cases} \] Now let’s assume that \({X_i}\) is ergodic and stationary. In this case, using the results of e.g. Billingsley’s Statistical Methods in Markov Chains, we can show that the maximum likelihood estimator of \(\theta\) has the same asymptotic distribution as it would have had if the observations \(X_i\) where independent.

Example

Suppose that \[ g(x,y)=\begin{cases} \frac{3}{2}, & \text{if }y=1,\\ \frac{1}{2}, & \text{otherwise.} \end{cases} \] And that \(f_\theta(x) = \Phi(\alpha + \beta x)\). This chain will probably be stationary and ergodic unless \(\beta \geq 0\), as it would have a trend when \(\beta > 0\) and be too variable when \(\beta = 0\).

Simulations

Let’s simulate \(n\) values from the process. Notice that we don’t make use of the first value.

simulator <- \(n, alpha, beta, up, down, x0) {
  f = \(x) pnorm(alpha + beta * x)
  x = rep(NA, n)
  x[1] = x0
  for(i in seq(2, n)) {
    z = rbinom(1, 1, f(x[i-1]))
    x[i] = x[i-1] * (up * z + down*(1-z)) 
  }  
  x
}

We can estimate the parameters using nlm.

f <- \(p, x) {
  ups = diff(x) > 0
  alpha = p[1]
  beta = p[2]
  -sum(pnorm(alpha + beta * x[ups], log.p=TRUE)) -
    sum(pnorm(alpha + beta * x[!ups], lower.tail = FALSE, log.p=TRUE))
}

estimator <- \(x) fit = nlm(f, p = c(1, 1), x = x)$estimate

Let’s simulate the variance of the \(\beta\) parameter.

set.seed(313)
alpha = 2
beta = -1
up = 3/2
down = 1/2
x0 = 1
results <- replicate(1000, {
  x = simulator(1000, alpha, beta, up, down, x0)
  estimator(x)
})

var(results[2, ]) * 1000
## [1] 4.159315

And approximate the asymptotic variance using a large sample size.

x = simulator(100000, alpha, beta, up, down, x0)
solve(nlm(f, p = c(1, 1), x = x, hessian = TRUE)$hessian)[2, 2] * 100000
## [1] 4.274457

I count this a match, almost confirming our hypothesis. Let’s try a smaller \(n\) in the simulation.

n = 100
results_small <- replicate(10000, {
  x = simulator(n, alpha, beta, up, down, x0)
  estimator(x)
})

var(results_small[2, ]) * n
## [1] 5.635581

The variance is, unsurprisingly, larger than the true variance. There might be methods to fix this in the literature, using e.g. Edgeworth expansions or something, but I haven’t checked.

Improving confidence sets?

The convergence to the limiting distribution is not very fast. To improve it, it might be worth using the parametric bootstrap, which is second-order accurate (under iid conditions, at least). But I don’t know how well it works in the case of Markov chains.