## Introduction

Suppose you conduct an experiment of the following form. You start out with an \(X_0\), then do a binary experiment \(Y_1\) with conditional probability \(P(Y_1\mid X_0)=f_\theta(X_0)\). Then you choose your next value of \(X\) deterministically, based solely on the value of \(X_0\) and \(Y_1\) using a known \(g(x,y)\), i.e., \(X_1\mid Y_0,X_1 = g(X_1,Y_0)\). Doing this \(n\) times, you will get a sequence of values \(X_0,X_1,X_2,\ldots,X_n\) and \(Y_1,Y_2,\ldots,Y_n\).

Our goal is to do inference on \(\theta\). We make the following observations.
1. If \(g\) is invertible in \(y\) for each \(x\), the sequence \({Y_i}\) is derivable from \({X_i}\).
2. The sequence \({X_i}\) is a Markov chain with transition probabilities
\[
P(X_{i}=x\mid X_{i-1})=\begin{cases}
f_{\theta}(X_{i-1}) & x=g(X_{i-1},1),\\
1-f_{\theta}(X_{i-1}) & x=g(X_{i-1},0),\\
0, & \text{otherwise.}
\end{cases}
\]
Now let’s assume that \({X_i}\) is ergodic and stationary. In this case, using the results of e.g. Billingsley’s *Statistical Methods in Markov Chains*, we can show that the maximum likelihood estimator of \(\theta\) has the *same* asymptotic distribution as it would have had if the observations \(X_i\) where independent.

## Example

Suppose that
\[
g(x,y)=\begin{cases}
\frac{3}{2}, & \text{if }y=1,\\
\frac{1}{2}, & \text{otherwise.}
\end{cases}
\]
And that \(f_\theta(x) = \Phi(\alpha + \beta x)\). This chain will *probably* be stationary and ergodic unless \(\beta \geq 0\), as it would have a trend when \(\beta > 0\) and be too variable when \(\beta = 0\).

## Simulations

Let’s simulate \(n\) values from the process. Notice that we don’t make use of the first value.

```
simulator <- \(n, alpha, beta, up, down, x0) {
f = \(x) pnorm(alpha + beta * x)
x = rep(NA, n)
x[1] = x0
for(i in seq(2, n)) {
z = rbinom(1, 1, f(x[i-1]))
x[i] = x[i-1] * (up * z + down*(1-z))
}
x
}
```

We can estimate the parameters using `nlm`

.

```
f <- \(p, x) {
ups = diff(x) > 0
alpha = p[1]
beta = p[2]
-sum(pnorm(alpha + beta * x[ups], log.p=TRUE)) -
sum(pnorm(alpha + beta * x[!ups], lower.tail = FALSE, log.p=TRUE))
}
estimator <- \(x) fit = nlm(f, p = c(1, 1), x = x)$estimate
```

Let’s simulate the variance of the \(\beta\) parameter.

```
set.seed(313)
alpha = 2
beta = -1
up = 3/2
down = 1/2
x0 = 1
results <- replicate(1000, {
x = simulator(1000, alpha, beta, up, down, x0)
estimator(x)
})
var(results[2, ]) * 1000
```

`## [1] 4.159315`

And approximate the asymptotic variance using a large sample size.

```
x = simulator(100000, alpha, beta, up, down, x0)
solve(nlm(f, p = c(1, 1), x = x, hessian = TRUE)$hessian)[2, 2] * 100000
```

`## [1] 4.274457`

I count this a match, almost confirming our hypothesis. Let’s try a smaller \(n\) in the simulation.

```
n = 100
results_small <- replicate(10000, {
x = simulator(n, alpha, beta, up, down, x0)
estimator(x)
})
var(results_small[2, ]) * n
```

`## [1] 5.635581`

The variance is, unsurprisingly, larger than the true variance. There might be methods to fix this in the literature, using e.g. Edgeworth expansions or something, but I haven’t checked.

## Improving confidence sets?

The convergence to the limiting distribution is not very fast. To improve it, it might be worth using the parametric bootstrap, which is second-order accurate (under iid conditions, at least). But I don’t know how well it works in the case of Markov chains.