Algorithms, Fake News, & The Google Generation
Ohio Regional Association of Law Libraries (ORALL) Annual Meeting, as I presented on the duty of technology competence in the algorithmic society, an astute law librarian asked (paraphrasing), "how does fake news play into this?" That question gave rise to a flurry of brain activity, as I considered how Google, for example, ranks relevancy, the rise of fake news, and the ability of users to spot fake news sources -- particularly for legal research.
As I was presenting to a group of lawyers at a CLE this week, I polled them asking about the electronic resource that they primarily use for legal research. The overwhelming response was Google.
Google uses a trademarked, proprietary – mostly secret – algorithm called PageRank, which assigns each webpage a relevancy score based on factors, such as:
- The frequency and location of keywords within the webpage. If the keyword only appears once within the body of the page, it will receive a low score for that keyword.
- How long the webpage has existed: people create new webpages everyday. Google places more value on pages with an established history
- The number of other webpages that link to the page in question: Google look sat how many webpages link to a particular site to determine its relevance.
Out of these three factors, the third is the most important.
With the rise of fake news sources and given these factors that we know about Google's relevancy ranking, there's nothing to say that fake news cannot make its way into the top results based on a search query.
Couple this with the shoddy research habits of the "Google Generation" (those born 1993 and after), and you have a recipe for disaster.
As noted in my article, Beyond the Information Age: The Duty of Technology Competence in the Algorithmic Society, [t]here has been some interesting findings about the Google Generation's information behavior. As early as 2008, studies show that “the speed of young people’s web searching means that little time is spent in evaluating information, either for relevance, accuracy or authority.” Additionally, “[f]aced with a long list of search hits, young people find it difficult to assess the relevance of the materials presented and often print off pages with no more than a perfunctory glance at them.” Also, “[y]oung scholars are using tools that require little skill: they appear satisfied with a very simple or basic form of searching." In addition to the user habits of the Google Generation, society, in general, has become increasingly comfortable with relying on the top results that an algorithm generates. “[R]esearch indicates that over ninety percent of searchers do not go past page one of the search results and over fifty percent do not go past the first three results on page one.”
These ingrained research habits generally equate with allowing algorithms to do the heavy lifting to decide what is relevant. The user, through hasty searching and vetting of results, has just allowed the algorithm to have a significant role in selecting the content that the algorithm deems should advance the law, even if that information is "fake."
There is a silver lining in that a recent Pew Research poll suggests that younger people are generally better at spotting fake news sources.
Along with teaching law students about evaluating resources and spotting fake news, I can't help but wonder, will the algorithms, themselves, get better at spotting fake news in the future?
In summer 2017, Pew Research Center and Elon University’s Imagining the Internet Center conducted a large canvassing of technologists, scholars, practitioners, strategic thinkers and others, asking them to react to this framing of the issue:
The rise of “fake news” and the proliferation of doctored narratives that are spread by humans and bots online are challenging publishers and platforms. Those trying to stop the spread of false information are working to design technical and human systems that can weed it out and minimize the ways in which bots and other schemes spread lies and misinformation.
The question: In the next 10 years, will trusted methods emerge to block false narratives and allow the most accurate information to prevail in the overall information ecosystem? Or will the quality and veracity of information online deteriorate due to the spread of unreliable, sometimes even dangerous, socially destabilizing ideas?Respondents were then asked to choose one of the following answer options:
- The information environment will improve – In the next 10 years, on balance, the information environment will be IMPROVED by changes that reduce the spread of lies and other misinformation online.
- The information environment will NOT improve – In the next 10 years, on balance, the information environment will NOT BE improved by changes designed to reduce the spread of lies and other misinformation online.
Some 1,116 responded to this nonscientific canvassing: 51% chose the option that the information environment will not improve, and 49% said the information environment will improve.
A recent NYTimes article, No, A.I. Won’t Solve the Fake News Problem, discussed the ability for artificial intelligence (algorithms, here) to solve the problem for us stating, today’s A.I. operates at the “keyword” level, flagging word patterns and looking for statistical correlations among them and their sources. This can be somewhat useful: Statistically speaking, certain patterns of language may indeed be associated with dubious stories. For instance, for a long period, most articles that included the words “Brad,” “Angelina” and “divorce” turned out to be unreliable tabloid fare. Likewise, certain sources may be associated with greater or lesser degrees of factual veracity. The same account deserves more credence if it appears in The Wall Street Journal than in The National Enquirer. But none of these kinds of correlations reliably sort the true from the false. In the end, Brad Pitt and Angelina Jolie did get divorced. Keyword associations that might help you one day can fool you the next.
The article goes into much more detail about the ability of algorithms to detect fake news and the limitations of natural language processing. It's a great read.
Ultimately, we cannot rely on algorithms to detect fake news for us. As we train competent attorneys, we must continue to train them to be critical, evaluative users. This will be an increasingly uphill battle as the Google Generation and beyond enters law practice.