Regression discontinuity design (RDD) is a great tool for going beyond descriptive statistics and moving into causal inference. I’m assuming the reader knows the theory, assumptions, advantages and weakness of the method. Below I’ll just give a few, basic tips for how to present results and share code that graphs it. The interesting thing about RDD is that its a very visual estimation technique. You can see and, therefore, present to your audience what are the results in a very intuitive way. Here I’m proposing that you present your RDD estimation using three types of graphs. I present the first, and simplest, one in this post. The next ones are slightly more elaborate.
In short, RDD is about evaluating whether an outcome variable changes significantly after an independent variable changes its value. There is third element: the running variable that defines the value of the independent variable (0 or 1 in the sharp RDD case, which is the only one I’m dealing with here). If the running variable is below a certain value – the cutoff point – the independent variable takes the value of zero (the observations in this ‘area’ are the control group). If above, the independent variable is one (treated group).
We can show all the behavior of these three variable in a simple bidimensional scatter plot in which the y-axis is the outcome variable, the x-axis is the running variable and there is a vertical line signaling the cutoff point. I.e., this vertical line divides the scatter plot in half: the control group is plotted in the left-hand side, treated group in the right-hand side.
Therefore, this graph, very standard in RDD presentation, should look like the one below. Instead of a linear regression line, fit in a loess line to show the central tendency of the observations in each group. This way you will better visualize the functional form of the data and avoid imposing a linear trend while the data could actually be in, for example, a quadratic form.
Below a very simple code that graphs the plot above. We first create some fake data for easier replication.
##creating a fake dataset (N=1000, 500 at treated, 500 at control group) #outcome variable outcome <- c(rnorm(500, mean = 50, sd = 10), rnorm(500, mean = 70, sd = 10)) #running variable running.var <- seq(0, 1, by = .0001) running.var <- sample(running.var, size = 1000, replace = T) ##Put negative values for the running variable in the control group running.var[1:500] <- -running.var[1:500] #treatment indicator (just a binary variable indicating treated and control groups) treat.ind <- c(rep(0,500), rep(1,500)) data <- data.frame(cbind(outcome, running.var
Now the plot itself using ggplot2. Notice we need to tweak a bit the legend labels in this graph using the command scale_colour_discrete()
require(ggplot2) ggplot(data, aes(running.var, outcome, color = treat.ind)) + geom_point() + stat_smooth(size = 1.5) + geom_vline(xintercept=0, linetype="longdash") + xlab("Running variable") + ylab("Outcome variable") + scale_colour_discrete(name="Experimental\nCondition", breaks=c("0", "1"), labels=c("Control", "Treatment"))
Yes, that’s all. Now let us move to something slightly more complex in the next post.