We, as an industry, are fast approaching a time when we will have general purpose AIs handling our customer interaction, helping us make a wide variety of decisions, and driving future corporate strategies. These systems will operate at machine speeds, which means, if they get out of control, humans generally won’t be able to react fast enough to truly mitigate the damage. Mistakes could result in everything from significant damage to corporate assets, to avoidable catastrophic mistakes, to unfortunate deaths. In that mess of bad outcomes is avoidable lost customers and employees and IT purges resulting from those bad decisions.
It is therefore critical that we design these systems so that they do what they are supposed to do, they are trustworthy, and we can trust them, and so that their decisions are consistent with our company policies, strategies, and particularly our ethics. Things haven’t started out well, initial roll outs of facial recognition products were untrustworthy when it came to categorizing minorities due to bad samples and poor training and these were limited targeted systems. This kind of mistake with a General-Purpose AI could lead to even greater catastrophes because they could disadvantage millions who might then revolt.
I had a chat with Kush R. Varshney, Research Staff Member and Manager at the Thomas J. Watson Research Center in Yorktown Heights in New York, and he highlighted for me the impressive effort IBM was taking to assure AIs do more good than harm.
I’m going to focus on one interesting aspect of this effort this week, the juxtaposition of Fairness and Accuracy. (You’d think they’d be the same thing but they aren’t even close).
Fairness vs. Accuracy
One of the concepts I hadn’t really thought of that IBM is focused on is the concept of Fairness. This was highlighted by the early face recognition failures where white male accuracy drifted above 90% while accuracy surrounding women or minorities would often drift down to below 50% (and frequently mixed up people and animals).
IBM worked on developing critical metrics so that fairness could be measured, and problems could be tracked to issues surrounding the sample, process, or the unanticipated and unacceptable introduction of bias. These problems not only created issues for the trustworthiness of the system they put a cloud over facial recognition in general that continues with major efforts to block deployment both in the US and abroad.
By using these metrics IBM found that you could significantly improve both the fairness and trustworthiness of the system, but they ran into another problem. As the systems got more accurate with minorities, they often got less accurate with White males. This impact wasn’t due to sampling but often connected to regulation which forced inaccuracies and suggested that if accuracy was to be optimized across all groups, at least when it comes to regulated efforts, the regulators need to remain engaged and able to alter the regulations to assure that as accuracy increases for minorities and women white males don’t become disadvantaged. This should never be an either/or decision but one where all that use an AI solution get high quality results regardless of personal differences.
It fascinated me to understand that regulatory bodies were forcing decisions that reduced the accuracy for one group to benefit another when there is no technical need to hurt any group. This often speaks to why many diversity efforts fail, they attempt to take from those in power and those in power then balk at supporting the effort and they are in power so can have a significant adverse impact.
Wrapping Up: The Critical Need For Both Accuracy And Fairness For Everyone
I struggled with the trade off between accuracy and fairness as I didn’t think the two things should be in conflict. But when it comes to minorities, you could have a very high accuracy score but a very low fairness score because the massive accuracy surrounding white males cloud overwhelm the inaccuracy of any minority. You need both to make sure there aren’t hidden tradeoffs like this because you could be 100% accurate if the sample set is one group, and 0% accurate with another group that isn’t part of the sample and the goal needs to be 100% accurate in real world situations.
Now I’ve mostly focused on where we have seen the problems which is in facial recognition but now imagine these problems impacting every decision that surrounds you. The level of misery that a General-Purpose AI could create at machine speeds were it unfair could be catastrophic. That is why it is critical AI developers like IBM are focused like a laser now on eliminating unfairness because, if they wait until AIs run the world, we could literally have a Terminator result. And while I love watching those movies I don’t want to live in that future, and I doubt you do either.