đź“–đź“– Paper Synopsis: In Pursuit of Interpretable, Fair and Accurate Machine Learning for Criminal Recidivism Prediction
Objectives: We study interpretable recidivism prediction using machine learning (ML) models and analyze performance in terms of prediction ability, sparsity, and fairness. Unlike previous works, this study trains interpretable models that output probabilities rather than binary predictions, and uses quantitative fairness definitions to assess the models. This study also examines whether models can generalize across geographic locations.
Methods: We generated black-box and interpretable ML models on two different criminal recidivism datasets from Florida and Kentucky. We compared predictive performance and fairness of these models against two methods that are currently used in the justice system to predict pretrial recidivism: the Arnold PSA and COMPAS. We evaluated predictive performance of all models on predicting six different types of crime over two time spans.
Results: Several interpretable ML models can predict recidivism as well as black-box ML models and are more accurate than COMPAS or the Arnold PSA. These models are potentially useful in practice. Similar to the Arnold PSA, some of these interpretable models can be written down as a simple table. Others can be displayed using a set of visualizations. Our geographic analysis indicates that ML models should be trained separately for separate locations and updated over time. We also present a fairness analysis for the interpretable models.
Conclusions: Interpretable ML models can perform just as well as non-interpretable methods and currently-used risk assessment scales, in terms of both prediction accuracy and fairness. ML models might be more accurate when trained separately for distinct locations and kept up-to-date.
Abstract of Main Results for Criminal Justice Practitioner
Our goal is to study the predictive performance, interpretability, and fairness of machine learning (ML) models for pretrial recidivism prediction. ML methods are known for their ability to automatically generate high-performance models that sometimes even surpass human performance from data alone. However, many of the most common ML approaches produce “black-box” models—models that perform well, but are too complicated for humans to understand. “Interpretable” ML techniques seek to produce the best of both worlds: models that perform as well as black-box approaches, but also are understandable to humans. In this study, we generate multiple black-box and interpretable ML models. We compare the predictive performance and fairness of the ML models we generate, against two models that are currently used in the justice system to predict pretrial recidivism—namely, the Risk of General Recidivism and Risk of Violent Recidivism scores from the COMPAS suite, and the New Criminal Activity and New Violent Criminal Activity scores from the Arnold Public Safety Assessment.
We first evaluate the predictive performance of all models, based on their ability to predict recidivism for six different types of crime: general, violent, drug, property, felony, and misdemeanor. Recidivism is defined as a new charge for which an individual is convicted within a specified time frame, which we specify as 6 months or 2 years. We consider each type of recidivism over the two time periods to control for time, rather than to consider predictions over an arbitrarily long or short pretrial period. Next, we examine whether a model constructed using data from one region suffers in predictive performance when applied to predict recidivism in another region. Finally, we consider the latest fairness definitions created by the ML community. Using these definitions, we examine the behavior of the interpretable models, COMPAS, and the Arnold Public Safety Assessment, on race and gender subgroups.
Our findings and contributions can be summarized as follows:
We contribute a set of interpretable ML models that can predict recidivism as well as black-box ML methods and better than COMPAS or the Arnold Public Safety Assessment for the location they were designed for. These models are potentially useful in practice. Similar to the Arnold Public Safety Assessment, some of these interpretable models can be written down as a simple table that fits on one page of paper. Others can be displayed using a set of visualizations.
We find that recidivism prediction models that are constructed using data from one location do not tend to perform as well when they are used to predict recidivism in another location, leading us to conclude that models should be constructed on data from the location where they are meant to be used, and updated periodically over time.
We reviewed the recent literature on algorithmic fairness, but most of the fairness criteria don’t pertain to risk scores, they pertain only to yes/no classification decisions. Since we are interested in criminal justice risk scores in this work, the vast majority of the algorithmic fairness criteria are not relevant. We chose to focus on the evaluation criteria that were relevant, namely calibration and balanced group AUC (BG-AUC). We present an analysis of these fairness measures for two of the interpretable models (RiskSLIM and Explainable Boosting Machine) and the Arnold Public Safety Assessment (New Criminal Activity score) on the two-year general recidivism outcome in Kentucky. We found that the fairness criteria were approximately met for both interpretable models for blacks/whites and males/females—that is, the models were fair according to these criteria. The Arnold Public Safety Assessment’s New Criminal Activity score failed to satisfy calibration for higher values of the score. The results on fairness were not as consistent for the “Other” race category. It is difficult to interpret the fairness result for the “Other” race category, due to low-resolution race data.