Bias in Machine Reading & Artificial Intelligence

In August, The Wall Street Journal ran an interesting article on social bias in web technology (sub. req'd). The article noted that [w]hile automation is often thought to eliminate flaws in human judgment, bias—or the tendency to favor one outcome over another, in potentially unfair ways—can creep into complex computer code. Programmers may embed biases without realizing it, and they can be difficult to spot and root out. The results can alienate customers and expose companies to legal risk. Computer scientists are just starting to study the problem and devise ways to guard against it.

One common error is endemic to a popular software technique called machine learning, said Andrew Selbst, co-author of “Big Data’s Disparate Impact,” a paper to be published next year by the California Law Review. Programs that are designed to “learn” begin with a limited set of training data and then refine what they’ve learned based on data they encounter in the real world, such as on the Internet. Machine-learning software adopts and often amplifies biases in either data set.

In other words, machine learning deals with designing and developing algorithms to evolve behaviors based on empirical data. One key goal of machine learning is to be able to generalize from limited sets of data (paraphrased from. Machine learning is the specific capability to "adapt to new circumstances and to detect and extrapolate patterns".

This differs from artificial intelligence in that AI encompasses other areas apart from machine learning, including knowledge representation, natural language processing/understanding, planning, robotics etc.

When it comes to the social bias embedded in web technology, [t]ake recent research from Carnegie Mellon that found male Web users were far more likely than female users to be shown Google ads for high-paying jobs. The researchers couldn’t say whether this outcome was the fault of advertisers—who may have chosen to target ads for higher-paying jobs to male users—or of Google algorithms, which tend to display similar ads to similar people. If Google’s software notices men gravitating toward ads for high-paying jobs, the company’s algorithm will automatically show that type of ad to men, the researchers said.

From this work, there is an emerging discipline known as algorithmic accountability taking shape. These academics, who hail from computer science, law and sociology, try to pinpoint what causes software to produce these types of flaws, and find ways to mitigate them. Researchers at Princeton University’s Web Transparency and Accountability Project, for example, have created software robots that surf the Web in patterns designed to make them appear to be human users who are rich or poor, male or female, or suffering from mental-health issues. The researchers are trying to determine whether search results, ads, job postings and the like differ depending on these classifications.

One of the biggest challenges, they say, is that it isn’t always clear that the powerful correlations revealed by data-mining may be biased. Xerox Corp., for example, quit looking at job applicants’ commuting time even though software showed that customer-service employees with the shortest commutes were likely to keep their jobs at Xerox longer. Xerox managers ultimately decided that the information could put applicants from minority neighborhoods at a disadvantage in the hiring process.

This is an important consideration as we start to rely more and more on machine learning and artificial intelligence to do our thinking for us. While we should be using machine learning to augment our intelligence rather than to replace our analysis, if machine learning is resulting in biased data for our decision making, it can lead to disastrous results. And if we start to rely on machines to do our thinking for us, there is no system of checks and balances. Kudos to the academics focusing on this area.


Popular posts from this blog

For The Love Of Archives

Law Library Lessons in Vendor Relations from the UC/Elsevier Split

Library Catalogs & Discovery Layers