3 min read

Counterfactuals in the Probit Model

This post is about counterfactuals in the probit model. I wrote this while reading Pearl et al’s 2016 book Causal Inference in Statistics: A Primer.

The probit model with one normally distributed covariate can be written like this:

ϵN(0,1)XN(0,1)YX,ϵ=1X+ϵ0

Now we wish to find the density of the counterfactual Yx, see Pearl et al.  (2016) for definitions. The density of ϵ given X=x and Y=1 is

p(ϵX=x,Y=1)=p(x=x,Y=1ϵ)p(ϵ)p(x=x,Y=1)=p(Y=1ϵ,x=x)p(x=x)p(ϵ)p(x=x,Y=1)1x+ϵ0p(x=x)p(ϵ) or the normal density truncated to [x,), or ϕ[x,)(ϵ).

Probability of Necessity

Now we’ll take a look at the probability of necessity. The most obvious way to generalize the probability of necessity to continuous distributions is to allow the counterfactual x to be a parameter of the counterfactual Yx, like this:

P(Yx=0X=x,Y=1)=1x1x+ϵ0ϕ[x,)(ϵ)=1max{x,x}ϕ[x,)(ϵ)dϵ=1min(Φ(x),Φ(x))Φ(x) Let’s plot this function.

x <- seq(-3, 3, by = 0.01)
x_hat <- 0

ps <- function(x, x_hat) 1 - pmin(pnorm(x_hat), pnorm(x))/pnorm(x_hat)

plot(x = x, y = ps(x, x_hat), 
     type = "l", 
     xlab = "Counterfactual x'",
     ylab = "Probability",
     main = "Counterfactual Probabilities, x = 0")
grid()
lines(x = x, y = ps(x, x_hat), type = "l")

There is nothing too strange. The probability of necessity goes to 0 as x, which is what you would expect. For the probability of Y=1 becomes really slim as x. When x>0 the probability of necessity is 0, as you will always observe Y=1 counterfactually when X=x if you observed Y=1 with X=x!

Integrated Probability of Necessity

A more complicated question is: What rôle did the fact that X=x have in Y=1? Or, if X wasn’t x, what would Y have been? With some abuse of notation, P(YX=0X=x,Y=1) answers this question, where X is an independent copy of X.

P(YX=0X=x,Y=1)=1max{x,x}ϕ[x,)(ϵ)ϕ(x)dx=1min(Φ(x),Φ(x))Φ(x)ϕ(x)dx=1xϕ(x)dx1Φ(x)xΦ(x)ϕ(x)dx=1Φ(x)121Φ(x)[Φ(x)Φ(x)Φ(x)]=12Φ(x)

Now let’s plot this.

counter = function(x) 0.5*pnorm(x)

plot(x = x, 
     y = counter(x), 
     type = "l", 
     xlab = "Actual x",
     ylab = "Probability",
     main = "Counterfactual Probabilities")
grid()
lines(x = x, 
     y = counter(x), 
     type = "l")

Notice the asymptote at 0.5. No matter how large X=x we observe together with Y=1, we can never be more than 0.5 certain that Y=0 if we were to draw an x once again. The asymptote at 0 if we had observed at very small value X=x together with Y=1, and we were to draw again, we would be really certain that Y=1 would happen once again.