Jacob Kaplan bio photo

Jacob Kaplan

PhD candidate in Criminology at the University of Pennsylvania

Email Github LinkedIn Twitter


The data comes from either NACJD or the FBI (if you email them at cjis_comm@leo.gov they'll mail you a DVD with data). I get the data as fixed-width text files and use an SPSS setup file (with NACJD this comes with the data, with the FBI I make it myself using their Record Description PDFs) using my R package asciiSetupReader. I then minorly clean and combine the files in R. All the setup files I made, the FBI Record Descriptions, and my R code is available in my GitHub Repo here. For any specific questions, please email me.

Negative values reflect adjustments of previous returns. So if the agency reports, for example, a burglary in January then in February discovers that it wasn't actually a burglary, they will record that both as 1 unfounded burglary and -1 actual burglary in February. This is so it "deletes" the erroneously recorded burglary from January. See more on pages 82-83 of the FBI's Manual for UCR data.

Crime - Offenses Known and Clearances by Arrest

This data doesn't differentiate between a "real zero" and a "not reported zero". If an agency doesn't report any crimes (even if crimes did occur), the data will say zero crimes occurred. Even though the data indicates how many months of the year that agency reported, that doesn't necessarily mean that they reported fully. An agency that reports all 12 months of the year may still report only incomplete data. Agencies can report partial data each month and still be considered to have reported that month. Chicago, for example, reports every month but until the last few years didn't report any rapes.

No, this data only includes the most serious crime in an incident (except for motor vehicle theft which is always included). For incidents where most the one crime happens (for example, a robbery and a murder), only the more serious (murder in this case) will be counted. This is called the Hierarchy Rule. See more on pages 10-12 of the FBI's Manual for UCR data which details the Hierarchy Rule.

Though the Hierarchy Rule does mean this data is an undercount, data from other sources indicate it isn't much of an undercount. The FBI's other data set, the National Inicident-Based Reporting System (NIBRS) contains every crime that occurs in an incident (i.e. it doesn't use the Hierarchy Rule). Using this we can measure how many crimes the Hierarchy Rule excludes (Most major cities do not report to NIBRS so what we find in NIBRS may not apply to them). In over 90% of incidents, only one crime is committed. Additionally, when people talk about "crime" they usualyl mean murder which, while a very bad way to discuss crime, means the UCR data here is accurate on that measure.

A major limitation (in my opinion the most important limitation) to the data here is that it doesn't include crimes not reported to police. Based on victimization surveys that ask people both if they were victimized and if they reported that crime, we know that the majority of crimes are not reported. This probably won't matter when looking at a single city for a short period of time - the population won't change too much so even underreporting of crime will be consistent underreporting. The issue becomes serious when looking at a city with major population changes or comparing multiple cities as their population may have very different reporting practices. There's no easy solution here but it is an important aspect of understanding crime data that you should keep in mind. For a full breakdown of reporting rates broken down by crime and a number of characteristics about the crime and victim (and reasons for not reporting), see Tables 91-105 (pages 98-114) in this report on the National Crime Victmization Survey from 2008.

Using the rate helps deal with population changes that could lead to changes in crime merely because of that change but it isn't without its drawbacks. The main drawback with using a rate is that it assumes equal risk of victimization, which we know isn't correct. For example, when looking at rape, a crime that affects 6 times as many women as men (according to the 2016 National Crime Victimation Survey Table 6, page 9), yet the rate is based on total population in that city (the UCR does not differentiate victims by gender but other data sets, such as NIBRS do, allowing for better rates.). Other crimes require even more granular rates. Murder victims are predominantly young men, but this differs by type of murder - domestic violence victims are mostly women. So while rates are probably better than counts as it lets you control for population, consider exactly who that population is, and how risk changes within that population.