Empirical Study Shows Algorithmic Bias in Library Discovery Layers

Discussions surrounding algorithmic bias are fairly common. We've even seen discussion of algorithmic bias in library discovery tools. Most of this discussion, though, has been theoretical. In what is purportedly the first empirical study to analyze algorithmic bias in library discovery systems, Matthew Reidsma put ProQuest's Topic Explorer to the test to review potential biases affecting results.

More and more academic libraries have invested in discovery layers, the centralized “Google-like” search tool that returns results from different services and providers by searching a centralized index. The move to discovery has been driven by the ascendence of Google as well as libraries' increasing focus on user experience. Unlike the vendor-specific search tools or federated searches of the previous decade, discovery presents a simplified picture of the library research process. It has the familiar single search box, and the results are not broken out by provider or format but are all shown together in a list, aping the Google model for search results.

The potential for bias is particularly troublesome for library discovery layers because libraries are seen as highly reputable institutions that users should inherently trust.

As Reidsma notes, that our perception of search tools’ trustworthiness should be so uncritical has been a boon to the industry. In librarianship over the past few decades, the profession has had to grapple with the perception that computers are better at finding relevant information then people. On the technical services side of the profession, we have responded to this perception by pushing for more integration with our various search tools. Over the past decade discovery tools, which search a unified index of providers from a single certain point, have changed the way that many library users do research. As our discovery tools have become more complex, much of the discussion and critique has centered on the simplification of the search process, the effectiveness of user interface elements, and the integration with other library systems and services. [He] ha[s] found no substantive evaluation of the search algorithms of commercial library discovery platforms in the literature. The task of determining how successful our library discovery tools are at presenting good results is thus stymied by user perceptions of what the tools are capable of, the opacity of the business model of our search engine providers, and the fact that underlying everything was a series of instructions written by people with a particular point of view.

Reidsma put a discovery layer to the test through a series of searches designed to ferret out bias. Ultimately, Reidsma found that the discovery layer's algorithm showed bias in the following areas:
  • Women
  • The LGBT Community
  • Islam
  • Race
  • Mental Illness
The biased results are in areas commonly known for their stereotypical social biases. What we find, time and again, is that the algorithms are only as good as the biases present in the programmers who code them.

Why is this an issue in the library world? Since the goal of the Topic Explorer is to identify the underlying topic behind a user's search, incorrect or biases results can have a great impact on a user's perception of a topic. By showing results that exploit stereotypes or bias, the Topic Explorer is saying to the user, “this is what you are looking for.” The purpose of [Riedsma's] examination was to bring these anomalies to light and start a discussion within the library community about how to improve our search tools for everyone.

Comments

Popular posts from this blog

Artificial Intelligence in Law Schools: Busting the Silo

Proposed Change to ABA Standard 601: Written Assessment of Law Library Effectiveness

The Information Business; The People Business