The Problem with Bivariate Regression

by David

In the debate over guns and gun control, many types of evidence are put forward to support either side of the isles' arguments. People post studies, scatterplots, bar charts, etc. to support their arguments about why guns are good or bad for society. In this post, I'll go over the "bivariate regression, " which usually comes in the form of one posting a scatterplot plotting guns or number of gun control laws against some outcome variable. 

The bivariate regression tells us little about the relationship between variable X and variable Y. Firstly, I know it doesn't need to be repeated, but correlation is not causation. This means that two variables can be positively correlated with each other (usually represented by some R^2 value), but that does not mean the independent variable causes the dependent variable to change. You can find all sorts of "spurious correlations" online that any reasonable person would correctly conclude have no causal relationship. 

That brings us to the topic at hand. In the gun control debate, there are many organizations and even academic journals who publish statistics using bivariate regressions of states' gun ownership or gun laws (there are serious problems with this) and outcomes such as gun homicides, etc. The Giffords Law Center calls this correlation, "undeniable," and they are correct. There is an undeniable correlation, but whether that correlation is causal or not should be the question. Therefore, flatly posting a scatterplot to make the point that more guns = more crime is incorrect, and no serious researcher would say that tells us anything.

The reason why is you need confounding variables, but also, in gun research, you need to address two-way causation. Confounding variables are extremely important, and crucial towards causal inference as well. One needs to hold constant variables that correlate with the independent variable of interest and affect the dependent variable. Here, I attempt to do that.

As you can see, in the graph below, gun control law strength, as measured by Giffords, has a moderate correlation with firearm homicides at the state level (r = 0.27). 



However, we also know through past research that poverty and homicide rates have a relationship. One study that looked at data from the largest 190 cities in America in 1990 found that, "...inequality and poverty have significant and independent positive effects on rates of homicide in U.S. cities..." Also, the correlation between poverty and firearm homicide rates at the state level is very strong (r = 0.76).


There is also a moderate correlation between strength of gun laws and poverty rates (r = 0.45), as shown in the correlation matrix below. This means that states that have weaker gun laws tend to have higher poverty rates. Therefore, it might be the case that poverty rates in loose gun control states are driving high firearm homicide rates.


To answer that question, we can use multi-linear regression and test the effect of gun laws while controlling for poverty rates. When we do this (as seen below), the variable "lawstrength" is rendered statistically insignificant, while the model shows that poverty rates ("povrate") have a strong positive effect on firearm homicide rates (p < 0.01). The estimates imply that a 1 percentage point increase in a state's poverty rate increases the firearm homicide by 1.34 per 100,000 (Also, take note of how the R^2 for the model jumps).


I hope this post offers a short and quick lesson of why graphs of bivariate regressions are unreliable and really don't tell us anything about the relationship between the independent variable and the dependent variable. 

Comments

Popular posts from this blog

On RAND’s Recent Report

What Do We Know About Permitless Carry?