Case Analysis: Risk Needs Assessment Tests
With more than 2.3 million inmates, America has the largest prison population in the world. US general population accounts for 5% of the world, while the prison population accounts for more than 25% of the world’s prison population. The number of prisoners has gone up in the past three decades (and are lowering since 2013) which has affected both the economic and social aspects of the country. Owing to the cost of maintaining the federal poison and increasing reliance on incarceration – the nation’s criminal justice system is trying to come up with ways to improve the system, including rehabilitation programs and implementation of an algorithm that predicts the risk of recidivism to assist in deciding the length of sentence. In this case study, I will be dwelling into the ethical, legal and technical aspects of the implementation of this algorithm in deciding the length of the sentence.
Risk Needs Assessment tests or RNA’s tests are used to classify offenders into the levels of risk (e.g., low, medium, and high.) to have recidivism (tendency to re-offend). This test does not indicate whether a particular offender will re-offend, rather it identifies the ‘risk’ or the probability that the offender will re-offend. Like any other data-based algorithms, this assessment is based on the extent to which an offender has characteristics like those of other offenders who have reoffended. For example, an RNA that results in a high-risk classification means that the offender has characteristics like other offenders who have re-offend. This algorithm decides the risk factor based on a few static factors like age at first arrest, gender, race and some dynamic factors (factors that could change over time) like current age, education level, anti-social behavior, number of arrests and employment status.
The early version of the RNA’s has been around since the 1970s, however, then these tools helped law enforcement agencies to identify inmates with low risk of recidivism so that they can be eligible for early parole. Eventually, the administrators became confident with the applications of RNAs, and not only modified RNA’s with the latest technology i.e. big data & machine learning but also extended the results to pre-trial decisions such as bail amounts or sentence length. Did the administrators ensure that the algorithms are unbiased? Did they ensure that there are no false positives? Are all humans no matter their race or age or place of origin treated equally? The historic data used for this algorithm is inherently biased. Statistically, it has been seen that black men are the most prevalent target and 1 in 3 black men in America have served time in prison. So did we do enough to make sure that this practice of institutionalized racism has improved with the new and evolved technology or did we just got lured here with the appeal and simplicity of numbers, as put forward by a judge?
. . . I was instantly interested because I realized that I’d been on the bench, at that point, for 20 years sentencing people, and I had no clue what I was doing. I didn’t know what worked. I didn’t know what didn’t. I didn’t know whether what I was doing was working. So, I thought, yeah, there’s all this information out there. We need to learn it, we need to use it.
Let’s first look into the ethical viewpoint for this use case, buy understanding the aspects put forward by the Royal Society(Oxford). Firstly, the ‘Ethics of data’ which focuses on the collection and analysis of datasets, privacy, transparency, and consent of the person involved – in this particular case data is collected from the convicts and criminals, it is only fair on the society as a whole, to keep a record of offenses of a person, hence its ethical to collect this data. The second aspect of this discussion, ‘Ethics of an algorithm’ focuses on issues in algorithms that are made from biased human data, these algorithms have the potential to increases discrimination towards a group. For this aspect, lets first accept the fact that we are a prejudiced society, using data from such a society would mean that data is biased as well. For the RNA test, an external analysis was performed by ProPublica. They concluded the algorithm was more ‘likely to falsely flag black defendants as future criminals, wrongly labeling them this way at almost 2 times the rate as white defendants. White defendants were mislabelled as low risk more often than the black defendant. Black defendants were still 77% more likely to be pegged as at higher risk of committing a future violent crime and 45% more likely to be predicted to commit a future crime of any kind.’ These results are significant and highly ethical, they disproportionately target and damage the lives of real, living people. The third aspect, ‘Ethics of practice’ focuses on the responsibilities of companies and organizations that consume or collect the data. The application of RNA’s has serious, knowing that the algorithms have many false negatives, it is important to understand the decision of the algorithm – why is it giving out the score that it’s giving, it also important to critically evaluate a person’s sentence, even after getting a score from this algorithm. However, even the Data scientist, creating the algorithms find it hard to explain the decision of the algorithm. On top of that, the government sector outsources such algorithms to private companies who claim that their algorithm is a ‘Black box’, to conserve their right, and it becomes even harder for criminals and lawyers to understand the decision made by RNA’s.
The assessment of offender risk was originally a matter of professional judgment. Prison staff, based on their own experience and training, would typically determine an intake in which offenders were more or less likely to be a safety or security risk. These assessments would then be used to assign inmates to the appropriate institution or unit based on their risk determination. Needless to say, this assessment was biased and highly subjective, and RNA was a tool to demolish this subjectiveness in the assessment, however, in my opinion, it posses two-issue – lack of transparency and lack of equality.
With time, the RNA was improved by incorporating the important advances suggested by research, including adding factors because they are more relevant, adding changeable factors (e.g., dynamic factors). As the algorithms are made by the third parties, they proprietary or ‘black boxed’, meaning only the owners, and to a limited degree, the purchaser can see how the software makes decisions. Consider a scenario in which the defense attorney calls a developer of a neural-network-based risk assessment tool to the witness stand to challenge the ‘high risk’ score that could affect her client’s sentence. On the stand, the engineer could tell the court how the neural network was designed, what inputs were entered, and what outputs were created in a specific case. However, the engineer could not explain the software’s decision-making process. With these facts, or lack thereof, how does a judge weigh the validity of a risk-assessment tool if (s)he cannot understand its decision-making process? How could an appeals court know if the tool decided that socioeconomic factors, a constitutionally dubious input, determined a defendant’s risk to society? This lack of transparency has real consequences. In the case of Wisconsin v. Loomis, defendant Eric Loomis was found guilty for his role in a drive-by shooting. During intake, Loomis answered a series of questions that were then entered into Compas, a risk-assessment tool developed by a privately held company and used by the Wisconsin Department of Corrections. The trial judge gave Loomis a long sentence partially because of the ‘high risk’ score the defendant received from this black box risk-assessment tool. Loomis challenged his sentence because he was not allowed to assess the algorithm. Last summer, the state supreme court ruled against Loomis, reasoning that knowledge of the algorithm’s output was a sufficient level of transparency. What baffles me is that currently, there is no federal law that sets standards or requires the inspection of these tools. If we contemplate the words of Dan Brown in Digital Fortress – ‘Who will guard the guards? If we’re the guards of society, then who will watch us and make sure that we’re not dangerous?’ – If algorithms are guarding us, defining our lives, giving us sentences, who is grading the algorithms?
Another legal aspect that bewilders me is an example of Brisha Borden, an 18-year-old black woman from Fort Lauderdale, Florida received an 8 on her RNA test after stealing an $80 bike. Vernon Prater, a white 41-year-old male, stole goods from a Home Depot worth a comparable amount. Prater had several previous charges and had served time in the past, yet he nevertheless received a risk score of 3. Ultimately, these scores were used to determine bail amounts for both defendants and disproportionally burdened Borden. Even further, two years after the scores had been administered, Prater was back in jail for another crime, whereas Borden, who had received a higher risk score, had not been charged with any new crimes. This inequality I believe revokes the basic right to equality for the citizens of the USA, which states that
The Equality Act is a bill passed by the United States House of Representatives on May 17, 2019, that would amend the Civil Rights Act to ‘prohibit discrimination based on the sex, sexual orientation, gender identity, or pregnancy, childbirth, or a related medical condition of an individual.
Lastly, can we accept the result of the RNA algorithms, how accurate are they? An external evaluation by ProPublica’s on Northpointe’s algorithm, COMPAS found that the algorithms accurately predict recidivism rates for 61% of individuals. Further, as stated earlier, it is most likely to falsely flag black defendants as compared to the white defendants. These results are significant, both in calculation error and in the way they irrevocably and disproportionately target and damage the lives of real, living of people. Can we accept this amount of inaccuracy? Since the future of people depends on this, I don’t think so.
I would like to conclude by saying that, although the intention of implementing an RNA was good, there were certain shortcomings. Right from data collection, to algorithm bias, there is also an imbalance of the amount of data representing every class connected with the analysis is a matter as it misrepresents or overrepresents a group, lack of interpretability of the algorithm, as most of the machine learning models are considered as a black box is a matter of concern as we do not know why the results are the way it is. One should consider that no algorithm is 100% right, and if it classifies someone wrong –individual – it will decide his or her future.