Robolawyer: A Case Against AI in Law
Introduction
Artificial intelligence has rapidly permeated many aspects of our lives, including the cars we drive, the search engines we use, and even the legal institutions under which we live. The temptation to make great use of the technology is understandable. After all, machine learning models can process vast amounts of information far quicker than a human ever could, thus opening the door for use cases in easing the burden of clerical work and performing legal research, and predictive algorithms can even be used to estimate outcomes of trials. In machine learning, there are two major ways to classify models based on their interpretability. The first types are called black box models, which tend to be more complex at the expense of the ability to ascertain how the model calculated its output. The second types are called white box models, which tend to be simpler and therefore easier to interpret in their decision-making process [1]. Although black box models tend to be more prominent in their usage due to their ability to capture more complicated trends in data, their prevalence underscores a sinister problem in machine learning applications to law: bias. From the data sets on which machine learning models are trained to the people who use and interpret the models, bias is prevalent at every step in the process, influencing an algorithm’s decisions. To make matters worse, the difficulty of interpreting black box models makes this bias hard to initially detect when it does pervade a model. Thus, it is important to greatly restrict the usage of AI in its current state in the criminal justice system, especially to applications pertaining to people, such as sentencing decisions and predicting crime risk, due to its inherent susceptibility to bias.
Bias and Machine Learning Models
In granting machine learning algorithms the authority to make consequential decisions in a legal setting, including sentencing recommendations and predictions of the outcomes of court cases, we inherently have the expectation that AI will make fair, unbiased decisions based on the given evidence. However, the process of training a machine learning model can lend itself to bias being baked into the model. Take the example of PredPol, a predictive policing algorithm developed by the Los Angeles Police Department. The LAPD intended to use the model to predict where property crimes may occur throughout the city [2]. However, an independent study demonstrated the algorithm to be biased toward largely non-White and low-income neighborhoods in predicting likely hotspots of drug crime in Oakland, CA, whereas an empirical, survey-based estimate found Oakland drug usage to be uniformly distributed across the city, causing the LAPD to discontinue using the model [3]. These results are not best understood as a model gone rogue but rather as a function of the data used to train it. Statistical studies suggest that police officers—whether implicitly or explicitly—consider race as a factor in determining which neighborhoods to patrol and people to detain, targeting Black and Hispanic people at a higher rate than White people [4]. Thus, the police data that was used to train PredPol likely overrepresented Black and Hispanic people as criminals, causing the algorithm to target predominately-Black and Hispanic areas and amplifying existing racial disparities in arrest data. The PredPol example illustrates the pitfalls of machine learning models. First of all, human-oriented data, including police data, is inherently prone to bias by the collection process. Humans are in charge of procuring the data used to train models, causing existing biases can manifest themselves in a few ways, including in capturing underlying trends in data that reflect societal inequities or in providing the model a dataset that lacks the diversity to make accurate predictions for all groups represented. At the other end of the process, humans are also in charge of determining how the model’s results are applied. In the PredPol example, this resulted in heavier policing on predominately-Black and Hispanic communities without regard to systemic injustices that might have influenced the model’s decisions; in general, it holds that careless uses of such algorithms only worsen the problem of bias.
The authority with which AI’s judgments are handled exacerbates the issue of bias when used in the criminal justice system. For instance, COMPAS (Correctional Offender Management Profiling for Alternative Sanctions), an algorithm meant to predict the risk of recidivism in convicted criminals, was used by several states in making parole decisions [5]. However, ProPublica demonstrated the algorithm to be deeply biased: it was 77% more likely to label Black defendants as high risk of committing a violent crime and 45% more likely to predict recidivism in Black defendants [6]. In actuality, the model was found to overpredict recidivism in Black defendants and underpredict recidivism in White defendants [7]. In this case, the issue stems not just from the bias, but also from the credibility given to COMPAS. In a court of law, the COMPAS output was fairly consequential for the future of the defendant, despite ultimately being a deeply flawed metric. This demonstrates another significant hazard in using AI in a legal setting: delegating a significant amount of authority to it is deeply problematic because AI can be biased without our express knowledge. The consequence of unfairly sentencing people to longer sentences along arbitrary lines—whether by race, sex, or income—undermines the principle of fairness upon which the legal system is supposed to be built. Thus, AI should be relegated as a tool, rather than an independently-acting component of the system.
A Useful Tool, Not a Panacea
While AI can amplify existing biases in the legal system, there are still quite a few areas, such as contract review and legal research, in which it could be a beneficial addition. For instance, contract review can be undertaken by AI much more rapidly than by a human, with a high degree of accuracy [8]. This tool allows a lawyer to take on a greater amount of cases due to an increase in efficiency. Furthermore, the potential for bias from an algorithm is mitigated. The machine learning model would serve only to fix existing technical mistakes, such as grammatical errors and incorrect terminology, making no original judgments of its own about other people. Another promising area is legal research: machine learning algorithms can rapidly scan legal archives to find similar cases, saving legal professionals the time required [9]. This research would also serve to be a useful tool for lawyers: the cases it finds could be used to establish a precedent, providing a compelling argument to achieve a similar ruling for the case at hand. This development greatly enhances the legal process by increasing efficiency for lawyers in reducing the time they need to spend on each client, reducing any backlog of defendants in the prison system. Furthermore, the potential for bias from an algorithm is mitigated in both of these cases. The machine learning model would serve only to fix existing technical mistakes, such as grammatical errors and incorrect terminology, making no original judgments of its own about other people. An important distinction can be drawn between these applications and those of PredPol and COMPAS: the latter were used to make judgments on individual people, allowing bias to greatly impact many defendants, whereas the former applications are largely clerical and only seek to improve existing work made by skilled professionals, mitigating the potential for bias.
Beyond Bias: Unexplainable and Unaccountable Models
Both the PredPol[10] and COMPAS[11] algorithms described above were black box models; that is, their decision-making process was not interpretable, making their bias difficult to ascertain. In both cases, even the act of exposing the bias took a detailed amount of data analysis conducted only after the algorithms made a large amount of skewed judgments. A shift toward easily-interpretable white box models would mitigate the potential for algorithmic bias to arise, expanding the amount of use cases for AI in law, but in the current state of affairs, it holds that the more popular black box models are too problematic to justify adopting on a mass scale in a legal setting.
The issue of accountability is also pertinent to the adoption of AI in a legal setting. A judge who makes partial rulings is subject to legal repercussions, incentivizing them to act fairly [12]. However, an algorithm that makes biased judgments is difficult to properly hold accountable: there is no comprehensive law that dictates who to hold liable for AI’s mistakes. For the PredPol and COMPAS cases, the worst consequence was merely a discontinuation of usage for the algorithms. Thus, given the legal ambiguity of the matter, adopting machine learning models heavily in legal matters makes it hard for those wronged by bias to get justice, as ascertaining liability in the first place is a glaring legal gray area.
Conclusion
In its current state, AI is not well suited to applications involving people in the criminal justice system, such as informing sentencing decisions or predicting crime risk, due to its inherent susceptibility to bias. That being said, AI’s future in this setting is promising. For instance, the aforementioned concept of white box models has the potential to mitigate the worst of the bias produced by machine learning models, as being able to understand the steps behind an algorithm’s decisions can lend insight into whether the model is impartial toward various groups. As AI grows more sophisticated, it will doubtless find more niches in the legal field, though we must be careful in implementing it too hastily, for the dangers of potential bias and misjudgment may have drastic effects.
Bibliography
Rudin, Cynthia, and Joanna Radin. “Why Are We Using Black Box Models in AI When We Don’t Need To? A Lesson From an Explainable AI Competition.” Harvard Data Science Review 1, no. 2 (November 1, 2019). https://doi.org/10.1162/99608f92.5a8a3a3d.
https://www.latimes.com/california/story/2020-04-21/lapd-ends-predictive-policing-program
Lum, Kristian, and William Isaac. “To Predict and Serve?” Significance 13, no. 5 (2016): 14–19. https://doi.org/10.1111/j.1740-9713.2016.00960.x.
Gelman, Andrew, Jeffrey Fagan, and Alex Kiss. 2007. “An Analysis of the New York City Police Department's "Stop-and-Frisk" Policy in the Context of Claims of Racial Bias.” Journal of the American Statistical Association 102 (479): 813–23. https://doi.org/10.1198/016214506000001040.
Mattu, Julia Angwin, Jeff Larson,Lauren Kirchner,Surya. “Machine Bias.” ProPublica. Accessed March 6, 2023. https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing.
Ibid.
Ibid.
Harvard Journal of Law & Technology. “A Primer on Using Artificial Intelligence in the Legal Profession,” January 3, 2018. http://jolt.law.harvard.edu/digest/a-primer-on-using-artificial-intelligence-in-the-legal-profession.
Ibid.
Lum, Kristian, and William Isaac. “To Predict and Serve?” Significance 13, no. 5 (2016): 14–19. https://doi.org/10.1111/j.1740-9713.2016.00960.x.
Mattu, Julia Angwin, Jeff Larson,Lauren Kirchner,Surya. “Machine Bias.” ProPublica. Accessed March 6, 2023. https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing.
“Code of Conduct for United States Judges | United States Courts.” Accessed March 14, 2023. https://www.uscourts.gov/judges-judgeships/code-conduct-united-states-judges.