AI’s “Black Box” Problem

Simon Chesterman

A résumé screening algorithm declines to shortlist any women for a job; a sentencing program concludes that a defendant has a high risk of reoffending but won’t say why. Who is responsible when technology makes decisions on behalf of humans — in some cases following processes that are impossible to understand?

AI is transforming the way businesses operate, with breathless talk of a fourth industrial revolution adding trillions to global economic output by 2030. Such systems are becoming more pervasive and more complex — reliance on them is growing even as the ability of non-specialists to understand them diminishes. This presents an accountability problem: if decisions are made by a ‘black box’, who is responsible when things go wrong?

‘Opaque’ means difficult to understand or explain, but it is helpful to separate out three reasons for this difficulty. The first is that certain technologies may be proprietary. Companies that invest in an AI system don’t want their competitors getting access to it for free. A second form of opacity may arise from complex systems that require specialist skills to understand them. These systems may evolve over time, sometimes patched by different IT teams, but they are in principle capable of being explained.

Neither of these forms of opacity — proprietary or complex — pose new problems for law. Intellectual property law has long recognized protection of intangible creations of the human mind and exceptions based on fair use. To deal with complex issues, governments and judges routinely have recourse to experts.

The same cannot be said of a third reason for opacity, which is systems that are naturally opaque. Some deep learning methods are opaque effectively by design, as they rely on reaching decisions through machine learning rather than, for example, following a decision tree that would be transparent, even if it might be complex.

To pick a trivial example, the programmers of Google’s AlphaGo could not explain how it came up with the strategies for the ancient game of Go that defeated the human grandmaster, Lee Sodol, in 2016. Lee himself later said that in their first game the program made a move that no human would have played — and which was only later shown to have planted the seeds of its victory.

Such output-based legitimacy — optimal ends justifying uncertain means — is appropriate in some areas. Medical science, for example, progresses based on the success or failure of clinical trials with robust statistical analysis. If the net impact is positive, the fact that it may be unclear precisely how a procedure or pharmaceutical achieves those positive outcomes is not regarded as a barrier to allowing it into the market.

Legal decisions, on the other hand, are generally not regarded as appropriate for statistical modelling. Though certain decisions may be expressed in terms of burdens of proof — balance of probabilities, beyond reasonable doubt, and so on — these are to be determined in individualized assessments of a given case, rather than based on a forecast of the most likely outcomes from a larger set of cases. 

There is a growing literature criticizing reliance on algorithmic decision-making with legal consequences. A significant portion now focuses on opacity, highlighting specific concerns such as bias, or seeking remedies through transparency. Yet the challenges of opacity go beyond bias and will not all be solved through calls for transparency or ‘explainability’.

Addressing these challenges is helped by clarifying why there is a problem with proprietary, complex, and natural opacity in the first place. 

One reason is that ‘black box’ decision-making may lead to inferior decisions. Accountability and oversight are not merely tools to punish bad behaviour: they also encourage good behaviour. Excluding that possibility reduces opportunities to identify wrongdoing, as well as the chances that decisions will be subjected to meaningful scrutiny and thereby be improved.  Volkswagen, for example, wrote code that gamed tests used by regulators to give the false impression that vehicle emissions were lower than in normal usage. Uber similarly designed a version of its app that identified users whose behaviour suggested that they were working for regulators in order to limit their ability to gather evidence.

A second reason is that opaque decision-making practices may provide cover for impermissible decisions, such as through masking or reifying discrimination. An example is Amazon’s résumé-screening algorithm, which was trained on ten years of data but had to be shut down when programmers discovered that it had ‘learned’ that women’s applications were to be regarded less favourably than men’s. Unintended biases may also be revealed due to the training data, such as the well-known problems with facial recognition. Different problems can arise with selection and weighting of variables. An ostensibly neutral metric like productivity of employees, for example, might adversely impact women if it does not account for the fact that they are more likely than men to take maternity leave.

Thirdly, the legitimacy of certain decisions depends on the transparency of the decision-making process as much as on the decision itself. A well-known case in the United States challenged reliance upon a proprietary sentencing algorithm called COMPAS. Although the trial judge ruled out probation because the algorithm said the defendant had a high chance of reoffending, the Supreme Court of Wisconsin upheld the sentence on the basis that the score it generated was supported by other independent factors and ‘not determinative’ of his sentence. It went on, however, to express reservations about the use of such software, requiring that future use must be accompanied by a ‘written advisement’ about the proprietary nature of the software and the limitations of its accuracy.

The means of addressing some or all of these concerns is routinely said to be through transparency. Yet while proprietary opacity can be dealt with by court order and complex opacity through recourse to experts, naturally opaque systems may require novel forms of ‘explanation’ or an acceptance that some machine-made decisions cannot be explained — or, in the alternative, that some decisions should not be made by machine at all.

This post discusses issues considered in more detail in “Through a Glass, Darkly: Artificial Intelligence and the Problem of Opacity”, American Journal of Comparative Law (2021).

Simon Chesterman is Dean of the National University of Singapore Faculty of Law and Senior Director of AI Governance at AI Singapore. His latest book is “We, the Robots? Regulating Artificial Intelligence and the Limits of the Law” (Cambridge University Press, 2021).