A Markov experiment

Introduction

Suppose you conduct an experiment of the following form. You start out with an $$X_0$$, then do a binary experiment $$Y_1$$ with conditional probability $$P(Y_1\mid X_0)=f_\theta(X_0)$$. Then you choose your next value of $$X$$ deterministically, based solely on the value of $$X_0$$ and $$Y_1$$ using a known $$g(x,y)$$, i.e., $$X_1\mid Y_0,X_1 = g(X_1,Y_0)$$. Doing this $$n$$ times, you will get a sequence of values $$X_0,X_1,X_2,\ldots,X_n$$ and $$Y_1,Y_2,\ldots,Y_n$$.

Our goal is to do inference on $$\theta$$. We make the following observations. 1. If $$g$$ is invertible in $$y$$ for each $$x$$, the sequence $${Y_i}$$ is derivable from $${X_i}$$. 2. The sequence $${X_i}$$ is a Markov chain with transition probabilities $P(X_{i}=x\mid X_{i-1})=\begin{cases} f_{\theta}(X_{i-1}) & x=g(X_{i-1},1),\\ 1-f_{\theta}(X_{i-1}) & x=g(X_{i-1},0),\\ 0, & \text{otherwise.} \end{cases}$ Now let’s assume that $${X_i}$$ is ergodic and stationary. In this case, using the results of e.g. Billingsley’s Statistical Methods in Markov Chains, we can show that the maximum likelihood estimator of $$\theta$$ has the same asymptotic distribution as it would have had if the observations $$X_i$$ where independent.

Example

Suppose that $g(x,y)=\begin{cases} \frac{3}{2}, & \text{if }y=1,\\ \frac{1}{2}, & \text{otherwise.} \end{cases}$ And that $$f_\theta(x) = \Phi(\alpha + \beta x)$$. This chain will probably be stationary and ergodic unless $$\beta \geq 0$$, as it would have a trend when $$\beta > 0$$ and be too variable when $$\beta = 0$$.

Simulations

Let’s simulate $$n$$ values from the process. Notice that we don’t make use of the first value.

simulator <- $$n, alpha, beta, up, down, x0) { f = \(x) pnorm(alpha + beta * x) x = rep(NA, n) x[1] = x0 for(i in seq(2, n)) { z = rbinom(1, 1, f(x[i-1])) x[i] = x[i-1] * (up * z + down*(1-z)) } x } We can estimate the parameters using nlm. f <- \(p, x) { ups = diff(x) > 0 alpha = p[1] beta = p[2] -sum(pnorm(alpha + beta * x[ups], log.p=TRUE)) - sum(pnorm(alpha + beta * x[!ups], lower.tail = FALSE, log.p=TRUE)) } estimator <- \(x) fit = nlm(f, p = c(1, 1), x = x)estimate Let’s simulate the variance of the \(\beta$$ parameter.

set.seed(313)
alpha = 2
beta = -1
up = 3/2
down = 1/2
x0 = 1
results <- replicate(1000, {
x = simulator(1000, alpha, beta, up, down, x0)
estimator(x)
})

var(results[2, ]) * 1000
## [1] 4.159315

And approximate the asymptotic variance using a large sample size.

x = simulator(100000, alpha, beta, up, down, x0)
solve(nlm(f, p = c(1, 1), x = x, hessian = TRUE)\$hessian)[2, 2] * 100000
## [1] 4.274457

I count this a match, almost confirming our hypothesis. Let’s try a smaller $$n$$ in the simulation.

n = 100
results_small <- replicate(10000, {
x = simulator(n, alpha, beta, up, down, x0)
estimator(x)
})

var(results_small[2, ]) * n
## [1] 5.635581

The variance is, unsurprisingly, larger than the true variance. There might be methods to fix this in the literature, using e.g. Edgeworth expansions or something, but I haven’t checked.

Improving confidence sets?

The convergence to the limiting distribution is not very fast. To improve it, it might be worth using the parametric bootstrap, which is second-order accurate (under iid conditions, at least). But I don’t know how well it works in the case of Markov chains.