Much has been written about the challenges associated with AI-based decisions. Some documented failures include gender and race biases in recruiting and credit approval software; chatbots that turned racist, and driverless cars that fail to recognize stop signs due to adversarial attacks; inaccuracies in predictive models for public health surveillance; and diminished trust because of the difficulty we have interpreting certain machine-learning models.
ADVERTISEMENT |
Although the value of machine intelligence is obvious, it is now becoming clear that machine decisions also represent a new kind of risk for companies. In addition to the social risks of bias, and financial risks of models that confuse correlation with causation, companies must also account for IT security risk, reputational risk, litigation risk, and even regulatory risk that might arise because of AI-based decisions. Just as we saw with information security, it is a matter of time before boards and CEOs will be held accountable for failures of machine decisions. But what is the C-suite supposed to do? Halting the rollout of AI is not an option.
Cybersecurity risk provides more than a warning; it also offers an answer. Cybersecurity audits have become a norm for companies today, and the responsibility and liability for cyber-risk audits goes all the way up to the board of directors. I believe companies using AI models for socially or financially consequential decisions need similar audits as well, and I am not alone.
The Algorithmic Accountability Act, proposed by Democratic lawmakers in spring 2019, would, if passed, require that large companies formally evaluate their “high-risk automated decision systems” for accuracy and fairness. EU’s General Data Protection Regulation audit process, while mostly focused on regulating the processing of personal data by companies, also covers some aspects of AI such as a consumer’s right to explanation when companies use algorithms to make automated decisions. Although the scope of the right to explanation is relatively narrow, the Information Commissioner’s Office (ICO) in the United Kingdom has recently invited comments for a proposed AI auditing framework that is much broader in scope.
The framework is meant to support ICO’s compliance assessments of companies that use AI for automated decisions. The framework has identified eight AI-specific risk areas such as fairness, transparency, accuracy, and security, among others. In addition, it identifies governance and accountability practices including leadership engagement, reporting structures, and employee training. ICO’s work is ongoing, and it will take some time before any regulatory consensus on an audit framework emerges. But forward-thinking companies should not wait for regulation. High-profile AI failures will reduce consumer trust and only serve to increase future regulatory burdens. These are best avoided through proactive measures today.
An AI audit process
An audit process would begin with the creation of an inventory of all machine-learning models being employed at a company, the specific uses of such models, names of the developers and business owners of models, and risk ratings. These ratings would measure, for example, the social or financial risks that would come into play should a model fail—which, in turn, might help determine the need for an audit. Were a model audit to go forward, it would evaluate the inputs (i.e., training data), model, and the outputs of the model. Training data would need to be evaluated for data quality as well as for potential biases hidden in the data.
For example, if a resume screening model is trained on past decisions about which applicants got job offers and which employees got promoted, we would want to ensure that the training data are not affected by unconscious biases of past recruiters and managers. Model evaluation would involve benchmarking against alternative models, statistical tests to ensure that the model generalizes from the training data to unseen data, and applying state-of-the-art techniques to enable model interpretability.
However effective a model, the inability to understand the factors driving a model’s recommendation can be a major deterrent to managerial and consumer trust in machine decisions. Audits should focus on the typical values of the input data and how the model responds to inputs that are outside the range seen in the training data. For example, if a stock-trading algorithm is trained on data from a period in which markets are relatively stable, how does it respond during wild market swings?
Such an audit process would be harder to do for models from vendors, but it is possible to subject vendor models to the same level of scrutiny.
Finally, AI audits should be performed by internal or external teams that are independent of the team that built the model. This is important to ensure that models are not audited in the same way that the data scientists who developed the model originally validated them.
Precedents
This overall approach has a precedent in model risk management processes that are required of big banks in the wake of the 2008 financial crisis. Model risk management is typically focused on validation of traditional statistical models, and over the years has helped some banks detect and correct issues with their models. But the scope of model risk management—regression models used by big banks—is relatively narrow. It doesn’t address machine-learning models that are continuously retraining as more data come in. Furthermore, issues such as bias and interpretability are often outside the scope of model risk management. Machine-learning models require an enhanced approach. I’d argue that auditors of machine-learning models need a hacker’s mentality to identify the different ways in which a model could fail.
The governance approach I outlined above has precedents. One company that grades the maintenance standards of New York rental buildings based on 311 calls used an audit to test the fairness of its algorithms. A large U.S. bank uses a third-party AI explainability solution to test and debug its AI models.
It is clear that machine-learning models that make socially or financially consequential decisions (e.g., recruiting, credit approval, ad targeting, or approval of financial transactions) need multiple lines of defense. Relying exclusively on the statistical tests conducted by the data scientists who develop the model will not suffice. A robust audit process would be the way forward for governing AI decisions.
First published Nov. 4, 2019, on the Knowledge@Wharton blog.
Add new comment