Algorithmic Bias: A Risk Management Perspective

9/16/2021

Stephen Lee, Amarita Natt, and Alex Rinaudo

Risk of Bias

Decisions are increasingly being augmented by machine learning systems (e.g., job applicant screening, price setting, fraud detection); as these decisions become more automated, the potential for systematic bias is an ever-increasing risk that must be managed. While there are legitimate reasons to champion machine learning and “big data” as offering a way to reduce human biases, unintended biases can still appear and lead to embarrassing or costly mistakes.

Two recent examples stand out:

  1. In February 2021, the New York Civil Liberties Union and Bronx Defenders filed a federal class-action lawsuit against the New York City ICE field office claiming unfair bias in their computer program (the Risk Classification Assessment Tool) that determines if people awaiting their immigration hearing can be released on bond or detained until the hearing. The lawsuit suggests that this algorithm has recommended jailing, without release, nearly every person arrested from 2017 to 2020 at rates much higher than normal for “low risk” individuals.
  2. In 2019, Apple and Goldman Sachs launched a credit card, called Apple Card, which utilized a machine learning algorithm to automate the process of setting credit limits.  Shortly after the launch, stories surfaced of the algorithm awarding higher credit limits to the husbands in married couples than the limits set for their wives. In March of 2021, an investigation by the New York State Department of Financial Services found no evidence of disparate impact,[1] but nonetheless, it is not hard to imagine future cases being ruled in the other direction.

While most discussions of algorithmic bias often focus on use of biased historical data,[2] in this article, we additionally consider how modeling decisions can impact bias.  We include two distinct issues practitioners should keep in mind when making their model selection, and end by offering questions business leaders can use to effectively manage risk.

Algorithmic Bias

An algorithm is a sequence of steps that, when followed, solves (or attempts to solve) a specific problem. In practice, an algorithm will receive some input, perform a sequence of steps, and return some output. For example, we can think of a traditional regression analysis as an algorithm that seeks to uncover relationships between an outcome and variables that may impact that outcome. Similarly, machine learning algorithms are an increasingly popular method to utilize large quantities of data for similar purpose.

In this context, “algorithmic bias” is the tendency for these outputs to be inaccurate in a consistent way. For example, if a job applicant screening algorithm receives otherwise identical resumes but consistently recommends one gender while rejecting another, we could characterize it as having a gender bias. So, how does this bias occur?

We distinguish between two sources of bias that can impact an algorithm’s output: 1) data bias and 2) model bias.  

Data Bias

It’s tough to make predictions, especially about the future.

– Yogi Berra

When discussing algorithmic bias, the first concern is often the integrity of the historical data. For example, if one were to use historical crime data to predict the likelihood that a parole candidate might commit repeat offenses, the data itself may contain historical human biases that will ultimately get codified into the algorithm. This is because a prediction algorithm only “knows” about the data you give it, and it is explicitly designed to mimic those results.

With that in mind, the data we use to “fit” or “train” an algorithm is of critical importance. Statisticians and economists have spent decades studying the various ways that biased data can undermine an analysis. With respect to data, two subcategories of bias worth highlighting are:

  • Sample bias: is the data a fair representation of the population we care about, or are some groups and events overrepresented while others are underrepresented?
  • Omitted variable bias: do we observe and collect data on all the relevant variables that could impact the outcome we care about?

Returning to the parole example, sample bias would occur if data were systematically collected in a way that did not accurately represent true crime patterns. Thus, if the law was not enforced equally across races and genders in the historical data, any model using that historical data would perpetuate that bias.

Suppose, however, that our historical data contains no sample bias; it is perfectly representative of the population we care about. If there are relevant[3] factors that are not collected in the data, or there are factors that cannot be observed, then even our best possible model will produce biased results.

For example, imagine we wanted to build an algorithm that helped determine appropriate wages based on historical employee data. If we include data on the highest degree of schooling that employee received, but do not collect other relevant variables like hours worked or productivity, we might expect the predictions made with this tool to be unfairly biased against employees who work very hard and do great work, but have less formal schooling. Intuitively, this is because the model will try its best to explain the outcome with whatever data it has available, and absent relevant information, is likely to place too much emphasis on the variables that are present. You can think of this as analogous to using crutches – when we omit a fully functioning leg, the healthy leg must carry additional weight.

Model Bias

All models are wrong, but some are useful.

– George Box

Once we are confident that we have reasonably unbiased data, we still need to determine the best modeling choices. This step is crucial as it can have the largest impact on mitigating the risk of bias when gathering better data is not be feasible. In this context, model bias can be summarized as follows: given unbiased data, can we expect the sequence of steps (i.e., the model) to produce accurate predictions, or if the predictions are inaccurate, do they tend to be inaccurate in a consistent way? Unfortunately, that is not the only challenge: we also often care about how the model will perform when making predictions with new data for the future. In this case, our model needs to balance the ability to make future (out of sample) predictions with its ability to accurately explain historical (in sample) data.

For example, suppose we build two different models that predict the result of tossing a fair coin. We first collect data on 100 coin flips and observe that exactly 50 are heads, and then use this data to build our algorithms (models). In the first model, we always predict heads. In the second model, we randomize between picking heads 50% of the time, and tails 50% of the time. Here, the first model (that always predicts heads) is clearly biased toward heads, but will work well in practice if, for example, we are an NFL football coach trying to decide how to win the coin toss at the beginning of each game.[4] In contrast, the second model is unbiased based on the data we collected, but, on average for many games, will not perform as well as always guessing heads. In this case, it is important to consider your goals to determine which of these tradeoffs serves you better for drawing conclusions and inferences that are useful for the future.  

This concept is summarized by the so called “bias-variance tradeoff”. This tradeoff is characterized as a tension between selecting an overly complicated model and an overly simplistic model. The complicated model will likely match the historical data better (high variance), but may not offer useful insight into the future. The simpler model, by contrast, may not match historical data as well, but may offer useful “rules of thumb” for the future (high bias). You might also hear this tradeoff referred to as overfitting (high variance) vs. underfitting (high bias).

One job of the analyst is to balance these tradeoffs to produce a model that fits the historical data well enough, while still offering useful insights to the future.

Practical Considerations

In practice, it is inevitable that you will eventually encounter biased data. In these cases, the modeling decision becomes critical since it is oftentimes the primary way to manage the risk of bias in the analysis.

#1 –The “least bad” bias

Depending on the situation, bias doesn’t have to be a deal breaker: a bias toward triple-checking an important email before hitting send may cost you a few extra minutes of your time, but may save you from an error that will consume significant amounts of you time in the future. Similarly, on its own, the presence of bias in your data or your model is not a reason to give up, but it does make it critical to understand the nature of the bias. The key distinction here is to watch for asymmetric risks.

For example, when considering a job applicant screening algorithm, some concern for data bias is warranted. Here, the potential risk would be using an algorithm that systematically rejects applicants from legally protected demographics (e.g., gender or race). In this case we could still use an algorithm to help sort candidates, but it might make sense to allow for a slight bias toward acceptance rather than rejection. The tradeoff is that we may require more human review of job applications, but the advantage is that we would be less likely to algorithmically (and potentially illegally) reject qualified applicants from historically underrepresented groups. 

#2 – Transparency vs. predictive power

Another common tradeoff is between model transparency and predictive power – in other words, do you value knowing why the answer is reached, or only that the answer is correct as often as possible? While the appropriate balance may not be obvious, the implications for corporate risk management can be significant. Consider two questions that might be approached with data analysis:

  • Should we pivot our company to offer a new product to a new consumer base?
  • What animals are present in this image?  

Each of these questions are routinely influenced by historical data, but each requires a different mix of transparency vs. predictive power. If you suggest to your boss that suddenly the company needs to offer a new product or service, the motivation will likely need to be explained. Alternatively, if your company specializes in organizing images on the internet and making them searchable, occasionally mislabeling a “cat” as a “dog” is less likely to require a detailed diagnosis and explanation – instead, you will probably just retrain the model with additional data.

Pertaining to bias, the important distinction is whether you would need to explain the results if something went wrong. Traditional tools from statistics and econometrics offer a high degree of transparency, but in many cases cannot offer the same level of predictive power as, for example, a deep neural network from the machine learning toolkit. Which modeling technique to pick depends on how you choose to handle the risk of biased results.

Discussion

For litigators working on cases that involve the use of computer algorithms, we offer several questions to help guide your discovery:

  1. How good is the data, and how hard is it to get better data? The answer to this question will help to address any “data bias” concerns. In short, if the data is representative of the relevant groups or events, and contains all relevant variables, you are in a good position to focus your efforts on the modeling techniques. If, in contrast, there are concerns about data (either that it is not perfectly representative, or you are missing key variables), is it feasible to obtain better data? The key insight here is perhaps obvious, but worth reiterating: the better your data, the more modeling opportunities that exist.
  • How costly is an error, and do errors need to be explained? The answer to this question will help to address any “model bias” concerns, and whether it makes sense to prioritize prediction or transparency. In general, the more costly the impact of a bad decision (either legally or otherwise), the greater the need to explain it, and the more transparency you will want in your modeling technique. For example, at one end of this spectrum, you have evaluating if an image is a dog or a cat – the cost of an error is relatively low. In this case, prioritizing prediction over transparency is likely preferred since an unforeseen bias is unlikely to cause long term ramifications. At the other extreme, deciding what job applicants to interview faces a high cost of being systematically biased, and an explanation for the motivation is more likely.
  • How will I know if the results are biased, and how long will it take? Finally, it is important to consider how results can be evaluated, and how long that feedback will take. Here, industry insights can help identify potential asymmetric risk in outcomes, and help to mitigate the chance that a false positive or a false negative from the model leads to lasting damage. In this sense, a robust testing strategy is critical to iteratively determine the best modeling choice for your situation.

Armed with these questions, we hope that you can better assess the risk associated with various applications for automated decision-making.


[1] https://www.bloomberg.com/news/articles/2021-03-23/goldman-didn-t-discriminate-with-apple-card-n-y-regulator-says

[2] https://www.law360.com/articles/1292974/print?section=compliance, https://www.law360.com/articles/1274143/print?section=aerospace, https://www.law360.com/articles/1180373/print?section=access-to-justice.

[3] The word “relevant” is doing a lot of work here. For an omitted variable to bias the results, it needs to be 1) correlated with the outcome, and 2) correlated with the other explanatory variables. If either of these conditions fail, then the omission of that variable will not bias the results.

[4] In this case, if we always guess the same (either heads or tails), we can expect to win about 50% of the games we play (as we play more games). Alternatively, and perhaps counterintuitively, if we randomize our guess and pick both heads and tails, we should expect to actually win less than 50% of the coin toss games we play. 

Contact Us