Introduction
Suppose you conduct an experiment of the following form. You start out with an , then do a binary experiment with conditional probability . Then you choose your next value of deterministically, based solely on the value of and using a known , i.e., . Doing this times, you will get a sequence of values and .
Our goal is to do inference on . We make the following observations. 1. If is invertible in for each , the sequence is derivable from . 2. The sequence is a Markov chain with transition probabilities Now let’s assume that is ergodic and stationary. In this case, using the results of e.g. Billingsley’s Statistical Methods in Markov Chains, we can show that the maximum likelihood estimator of has the same asymptotic distribution as it would have had if the observations where independent.
Example
Suppose that And that . This chain will probably be stationary and ergodic unless , as it would have a trend when and be too variable when .
Simulations
Let’s simulate values from the process. Notice that we don’t make use of the first value.
simulator <- \(n, alpha, beta, up, down, x0) {
f = \(x) pnorm(alpha + beta * x)
x = rep(NA, n)
x[1] = x0
for(i in seq(2, n)) {
z = rbinom(1, 1, f(x[i-1]))
x[i] = x[i-1] * (up * z + down*(1-z))
}
x
}
We can estimate the parameters using nlm.
f <- \(p, x) {
ups = diff(x) > 0
alpha = p[1]
beta = p[2]
-sum(pnorm(alpha + beta * x[ups], log.p=TRUE)) -
sum(pnorm(alpha + beta * x[!ups], lower.tail = FALSE, log.p=TRUE))
}
estimator <- \(x) fit = nlm(f, p = c(1, 1), x = x)$estimate
Let’s simulate the variance of the parameter.
set.seed(313)
alpha = 2
beta = -1
up = 3/2
down = 1/2
x0 = 1
results <- replicate(1000, {
x = simulator(1000, alpha, beta, up, down, x0)
estimator(x)
})
var(results[2, ]) * 1000
## [1] 4.159315
And approximate the asymptotic variance using a large sample size.
x = simulator(100000, alpha, beta, up, down, x0)
solve(nlm(f, p = c(1, 1), x = x, hessian = TRUE)$hessian)[2, 2] * 100000
## [1] 4.274457
I count this a match, almost confirming our hypothesis. Let’s try a smaller in the simulation.
n = 100
results_small <- replicate(10000, {
x = simulator(n, alpha, beta, up, down, x0)
estimator(x)
})
var(results_small[2, ]) * n
## [1] 5.635581
The variance is, unsurprisingly, larger than the true variance. There might be methods to fix this in the literature, using e.g. Edgeworth expansions or something, but I haven’t checked.
Improving confidence sets?
The convergence to the limiting distribution is not very fast. To improve it, it might be worth using the parametric bootstrap, which is second-order accurate (under iid conditions, at least). But I don’t know how well it works in the case of Markov chains.