3 min read

A Markov experiment

Introduction

Suppose you conduct an experiment of the following form. You start out with an X0, then do a binary experiment Y1 with conditional probability P(Y1X0)=fθ(X0). Then you choose your next value of X deterministically, based solely on the value of X0 and Y1 using a known g(x,y), i.e., X1Y0,X1=g(X1,Y0). Doing this n times, you will get a sequence of values X0,X1,X2,,Xn and Y1,Y2,,Yn.

Our goal is to do inference on θ. We make the following observations. 1. If g is invertible in y for each x, the sequence Yi is derivable from Xi. 2. The sequence Xi is a Markov chain with transition probabilities P(Xi=xXi1)={fθ(Xi1)x=g(Xi1,1),1fθ(Xi1)x=g(Xi1,0),0,otherwise. Now let’s assume that Xi is ergodic and stationary. In this case, using the results of e.g. Billingsley’s Statistical Methods in Markov Chains, we can show that the maximum likelihood estimator of θ has the same asymptotic distribution as it would have had if the observations Xi where independent.

Example

Suppose that g(x,y)={32,if y=1,12,otherwise. And that fθ(x)=Φ(α+βx). This chain will probably be stationary and ergodic unless β0, as it would have a trend when β>0 and be too variable when β=0.

Simulations

Let’s simulate n values from the process. Notice that we don’t make use of the first value.

simulator <- \(n, alpha, beta, up, down, x0) {
  f = \(x) pnorm(alpha + beta * x)
  x = rep(NA, n)
  x[1] = x0
  for(i in seq(2, n)) {
    z = rbinom(1, 1, f(x[i-1]))
    x[i] = x[i-1] * (up * z + down*(1-z)) 
  }  
  x
}

We can estimate the parameters using nlm.

f <- \(p, x) {
  ups = diff(x) > 0
  alpha = p[1]
  beta = p[2]
  -sum(pnorm(alpha + beta * x[ups], log.p=TRUE)) -
    sum(pnorm(alpha + beta * x[!ups], lower.tail = FALSE, log.p=TRUE))
}

estimator <- \(x) fit = nlm(f, p = c(1, 1), x = x)$estimate

Let’s simulate the variance of the β parameter.

set.seed(313)
alpha = 2
beta = -1
up = 3/2
down = 1/2
x0 = 1
results <- replicate(1000, {
  x = simulator(1000, alpha, beta, up, down, x0)
  estimator(x)
})

var(results[2, ]) * 1000
## [1] 4.159315

And approximate the asymptotic variance using a large sample size.

x = simulator(100000, alpha, beta, up, down, x0)
solve(nlm(f, p = c(1, 1), x = x, hessian = TRUE)$hessian)[2, 2] * 100000
## [1] 4.274457

I count this a match, almost confirming our hypothesis. Let’s try a smaller n in the simulation.

n = 100
results_small <- replicate(10000, {
  x = simulator(n, alpha, beta, up, down, x0)
  estimator(x)
})

var(results_small[2, ]) * n
## [1] 5.635581

The variance is, unsurprisingly, larger than the true variance. There might be methods to fix this in the literature, using e.g. Edgeworth expansions or something, but I haven’t checked.

Improving confidence sets?

The convergence to the limiting distribution is not very fast. To improve it, it might be worth using the parametric bootstrap, which is second-order accurate (under iid conditions, at least). But I don’t know how well it works in the case of Markov chains.