Building responsible AI systems starts with recognizing that technology solutions implicitly prioritize efficiency.
Interest in the possibilities afforded by algorithms and big data continues to blossom as early adopters gain benefits from AI systems that automate decisions as varied as making customer recommendations, screening job applicants, detecting fraud, and optimizing logistical routes.1 But when AI applications fail, they can do so quite spectacularly.2
Consider the recent example of Australia’s “robodebt” scandal.3 In 2015, the Australian government established its Income Compliance Program, with the goal of clawing back unemployment and disability benefits that had been made inappropriately to recipients. It set out to identify overpayments by analyzing discrepancies between the annual income that individuals reported and the income assessed by the Australian Tax Office. Previously, the department had used a data-matching technique to identify discrepancies, which government employees subsequently investigated to determine whether the individuals had in fact received benefits to which they were not entitled. Aiming to scale this process to increase reimbursements and cut costs, the government developed a new, automated system that presumed that every discrepancy reflected an overpayment. A notification letter demanding repayment was issued in every case, and the burden of proof was on any individuals who wished to appeal. If someone did not respond to the letter, their case was automatically forwarded to an external debt collector. By 2019, the program was estimated to have identified over 734,000 overpayments worth a total of 2 billion Australian dollars ($1.3 billion U.S.).4
The new system was designed to optimize efficiency, but without being attentive to the particulars of individual cases. The idea was that by eliminating human judgment, which is shaped by biases and personal values, the automated program would make better, fairer, and more rational decisions at much lower cost. Unfortunately, choices made by system designers both in how the algorithm was designed and how the process worked resulted in the government demanding repayments from hundreds of thousands of people who had been entitled to the benefits they had received. Some were compelled to prove that they had not illegitimately claimed benefits as long ago as seven years earlier. The consequences for many individuals were dire.
Subsequent parliamentary reviews pointed to “a fundamental lack of procedural fairness” and called the program “incredibly disempowering to those people who had been affected, causing significant emotional trauma, stress, and shame.”5 The parliamentary committee received evidence of at least two suicides related to the program, and there were numerous reports of financial hardship.6 In 2020 the country’s minister for government services scrapped the program and announced that 470,000 wrongly issued debts — totaling AU$721 million — would be refunded in full.
In this article, we will explain the unconsidered, implicit, and systemic biases that can cause such catastrophic failures to occur, and what managers can do to mitigate the risk that algorithmic systems will cause organizational or social harm. It is crucial for managers to understand that the solution to these technological failures is not better, or more, technology but rather a better understanding of what implicit choices we are making — particularly what values a solution will prioritize — when we use technology to solve a problem in the first place. Informed by French philosopher Jacques Ellul’s 1954 book The Technological Society, our own research seeks to unpack the mechanism at play, as detailed below.7
The Tyranny of Technique
In his book, Ellul argues that the central feature of modern organizations is the use of technique, defined in this case as the rational pursuit of any standardized means or practices for attaining desired results. Technique is thus embedded in technology, and it’s at play when, for example, organizations think of ways to maximize the yield of a manufacturing plant or deploy recommender algorithms to upsell services and products — or when governments seek to manage social benefits through algorithmic programs. When we apply technique, we tacitly agree that everything must be measured, quantified, standardized, and rationalized so that it is ready for computation. Technique, then, is also a way of imagining how the world should ideally look.
Building on Ellul’s thinking, we suggest that this induces a process we describe as the mechanization of values. Technique reinforces rationality as preferred mode of conduct and prioritizes efficiency as preferred outcome. This, in turn, strengthens the influence of technique in our lives; many of our actions focus on rational problem-solving for the purpose of achieving greater efficiency at work. Friendship, social cohesion, justice, compassion, and happiness (to name but a few profoundly human phenomena) do not fit this model: Their qualitative essence cannot be fully captured by technique.8
Instead, technique ruthlessly forces everything into a quantifiable straightjacket. Consider the example of friendship: On social media, which applies technique to managing human relationships, a friend is someone you connect with. The intensity of friendship becomes measured in quantifiable actions such as “likes” or “follows.” However, this approach strips a complex human phenomenon of its richness. We lose the nuance of different values and dynamics, such as different types of friends (close confidants, gym buddies, cordial colleagues, potential romantic partners) and different phases of friendship. The “value” of friendship is now flattened to what can be quantified or automated, such as the number of likes on a photo, or birthday wishes prompted by an automatic nudge. We lose the shared meaning of the photos, the sincerity of birthday wishes, and many of the richest aspects of how friendship can feel.
In other words, when we apply technique to a process, especially in the form of automated decision-making systems, we implicitly limit consideration of the full spectrum of human values in that process. That has important practical ramifications, because values concern the behaviors and goals that are personally and socially preferable, and they are typically multiple, complex, and sometimes contradictory.9
Organizations often operate at the interface of multiple values. For instance, the Australian compliance program was based on the premise of technique: It necessarily suppressed other values, such as compassion to citizens, fairness, and treating people with respect, which in the past could be accommodated when humans evaluated and made decisions about individual cases. But when technique sinks its teeth into processes previously permeated with multiple values, these multiple values end up becoming subordinated to ultimate values of rationality and efficiency. The goals of the rational robodebt model, to save administrative costs and boost reimbursements to the government, likewise drove the choice to take human workers out of the process in pursuit of greater efficiency.
In sum, whenever we apply technique to solve a problem, we run the risk of unintentionally creating a devastating side effect: Rationality as preferred conduct and efficiency as preferred end take over all other values, becoming ultimate values in and of themselves. This is obviously problematic, and it can become dangerously so when humans are no longer able to override automated systems and the consequences of their decisions.
How can managers and organizations that need or want to leverage these technologies protect multiple values in their decisions and routines to avoid the tyranny of technique? We propose that adhering to three principles can help:
- Beware of proxies and scaling effects.
- Strategically insert human interventions into your algorithmic decision-making.
- Create evaluative systems that account for multiple values.
We’ll look more closely at each of these principles below.
Recognize the problem with proxies. When building algorithmic models, it is often necessary to represent a phenomenon with a proxy. For example, in the robodebt case, the program used “average income” as a proxy for real income. Average income was a convenient choice: It was easy to calculate, and on the surface it seemed a reasonable proxy. However, ex post reviews found that it had several weaknesses. For example, the average income was calculated on an annual basis, but actual earnings received by individuals occur over pay periods. The proxy of average income, when spread over the period of a year, did not reflect so-called lumpy employment — where people cycle inconsistently in and out of work and receive benefits during periods of unemployment. Using the average income metric made it appear that income was earned when individuals were actually out of work. Additionally, income averaging did not actually produce a legal proof of debt. Technique and its quest for creating rational measures to represent reality seduced the responsible civil servants to proceed with a faulty measure and an assumption of debt, without legal proof.
With this example, we can see how proxies can easily reduce a phenomenon to one measurable dimension that is assumed to be constant when it is in fact variable. Using proxies can also involve treating conceptually distinct phenomena as equivalent predictors (such as a poor credit rating as a predictor of poor job performance). The distance or incompatibility between the conceptual origin of a phenomenon (like job performance) and its actual measurement (by way of credit ratings) implies divergence between what we think we measure and what is measured.
Proxies can easily reduce a phenomenon to one measurable dimension that is assumed to be constant when it is in fact variable.
Second, we need to be aware of the dangers of scaling effects. While scaling the use of a particular algorithm is efficient, an algorithm is next to impossible to rein back in once it becomes widely used within a sector. Consider the examples of a credit scoring system that becomes a de facto standard that every company uses: If the algorithm effectively discriminates against people with certain profiles, there’s no escaping its reach by choosing to do business with a company that might have a fairer or more flexible scoring system. Likewise, the university rankings popularized and promoted by U.S. News & World Report can prompt the organizations involved to aim for the same targets (that is, scoring high in relevant categories) to move up in the rankings and avoid being downgraded. These scaling effects usually have pernicious feedback loops, so a poorly ranked university will find it increasingly challenging to attract quality staff members and students, which in turn renders a future poor ranking even more likely.10 Of note, in cases where proxies are already heavily used, scaling effects can amplify the divergence between what we think we measure and what is measured. The potential problems with using proxies must be properly understood to limit the risk of mis-specifications in a model, but scaling effects are very difficult — if not impossible — to mitigate after the fact.
Put humans in the loop. Another safeguard involves strategically inserting human interventions in algorithmic decision-making. For instance, in the original process that existed before the Income Compliance Program implemented the robodebt system, tax officials used average income to identify discrepancies, but then humans investigated the 20,000 most extreme cases. Government workers could balance the conflicting values arising from the government’s desire to recoup benefit overpayments efficiently with the fair treatment of citizens. Civil servants manually reviewed the data to ensure that the income averaging was significant and then tried to validate the measure via information from other sources before notifying individuals of debts to be repaid.
The application of technique in the robodebt case removed these human interventions and allowed technique and the mechanization of values to go unchecked. The burden of proof shifted from the tax official, who had to justify approaching the individual benefits recipient, to the individual, who had to disprove an automated accusation. Without human judgment at key points there was no way to keep unforeseen and unintended consequences at bay.
Test outcomes for alignment with values. While we’ve seen that technological solutions prioritize rationality and efficiency, organizations can create processes to evaluate the outputs and outcomes from these systems to see how well — or poorly — they support multiple other values. In the robodebt case, program design prioritized saving costs and boosting reimbursements, but adding a more thorough evaluative process might have mitigated the tyranny of technique. Auditing the model and random samples of its outputs might have revealed its bad decisions; auditing the process for transparency, fairness, and procedural justice would likewise have exposed serious deficiencies. The now-defunct system relied on a one-size-fits-all approach and permitted wide gaps between individual cases and the models.
One way to achieve greater fairness would have been to close those gaps. In practice, this would have required a more complex model that accounted for different personal circumstances of individuals while ensuring that it did not discriminate against particular demographic or socioeconomic classes of individuals.
Organizations can evaluate AI systems’ outputs and outcomes to see how well they support values important to the group.
Such an approach implies diligent development and testing of the program before it is deployed in real life. Such testing is often reduced to making sure that a program runs smoothly, and thus it fails to investigate its outcomes to determine what biases might be built in. Therefore, any program based on algorithmic routines should be tested in multiple cycles that emulate the real-life circumstances in which the program will be applied. For example, running a pilot before fully implementing the algorithms would have helped to identify harmful proxies and overly simplified steps that could lead to false accusations. Having civil servants check a percentage of the automated demands for repayment on a regular basis would have helped to detect flaws in the Australian government’s system and process. Testing programs before deployment can mean that some systems are never deployed simply because the risk of damage would be too great.
Finally, when such automated processes are applied, there must be policies and procedures in place that allow those affected to voice their concerns — and to be heard by humans, not machines. This would allow unintended consequences, potential failures, and biases to be identified and corrected through human intervention to ensure a safer and fairer process. Obviously, such testing and adapting involves greater sensitivity for procedural fairness — a value that has to be entertained alongside the wish for efficiency savings if we are serious about tackling the tyranny of technique.
Our analysis underlines the need for greater sensitivity to how technique and the mechanization of values significantly influence the application of technology in organizations and society. When cases of technology-going-wrong emerge, managers should withstand the urge to look for more technological solutions and instead habitually interrogate the values they consider in their decision-making processes to ensure that mere efficiency isn’t automatically prioritized. The tendency to reach for the technological fix was also seen in the robodebt case, as reflected in three iterations of the tool — none of which tackled the serious underlying problems with the overall system and processes.
Would the safeguards we have proposed, and a renewed appreciation for different values that inform human decision-making, have prevented the robodebt scandal? We believe that they very likely would have done so, because acknowledging and appreciating multiple values, instead of just the values of rationality and efficiency, would have meant a different implementation of rules and regulations.
With this in mind, we hope that practicing managers embrace a readiness for more critical and reflexive interrogation about the values they employ to make managerial decisions, while developing greater sensitivity concerning the potential consequences of their choices.