Last month, my colleague Dave Grabarek approached me with a puzzling reference question. A researcher had gotten in touch to request an academic journal article from the 1970s. We regularly field such requests, but this time, Dave couldn’t find the article. It was missing from print and digital versions of the journal and didn’t appear in online indexes. As the reference librarian on duty, did I have any suggestions?
I repeated Dave’s search and had no better luck. I tried JSTOR, a popular database for academic research, and WorldCat, the global library catalog. Nothing: it was as if the article didn’t exist. We were so stumped that I found myself scouring the endnotes of other books the author had published, while Dave searched print bibliographic indexes that I hadn’t used since library school.
Usually, a hard-to-locate publication is the result of human error. Scholars may introduce typos when compiling their bibliographies, and library users sometimes misremember information about the item they’re looking for. (I once got a request for the book “Chicago in Black and White” and, in a flash of insight that may represent the high point of my career, realized the patron was looking for Erik Larson’s The Devil in the White City.)
Puzzlingly, the citation our researcher had sent us didn’t look garbled. The journal title, issue, and date corresponded with a real publication, and the author was a noted scholar of American legal history. I returned to the only place we’d found the article cited: a history piece that had recently appeared on a news aggregator website.
This time, I did what I should have done from the start: carefully read the 300-word online post that cited the journal article. This piece was written by someone who was not a subject specialist, judging by its bland prose and lack of analysis. On first glance, I’d written it off as a standard example of clickbait – content written to earn advertising revenue rather than inform readers. Now, though, certain details caught my attention: repetitive language and a final paragraph that felt weirdly disconnected from the rest of the text. I’d seen this style of writing before, in samples produced by ChatGPT.
Released in late 2022, ChatGPT has sparked public discourse since its launch. An example of generative artificial intelligence, the service is trained on large datasets in order to mimic natural language. In response to user-written prompts, ChatGPT is able to create texts that, at first glance, appear to be written by a human.
It’s easy to imagine the science fictional possibilities of this technology, with its ability to facilitate sophisticated conversations between humans and machines. However, ChatGPT’s reliance on imitation means that the technology currently has little to no fact-checking ability and is, in fact, adept at making things up. The online piece listed three scholarly publications that the “author” had ostensibly used to write the article. In reality, this bibliography was dreamed up by ChatGPT. One source was real, and two, including the source that had interested our researcher, were entirely fictional. The person who had uploaded the post had not disclosed their use of AI, so readers stumbling on it would not know to look out for these discrepancies.
To test ChatGPT’s bibliographic creativity, I logged into the service myself and asked it to write an article about a famous Virginian: singer Ella Fitzgerald, who was born in Newport News. The interface warned me that “the system may occasionally generate incorrect or misleading information.” ChatGPT got many basic details of Fitzgerald’s life right, though the meandering prose resembled a beginner high school essay in certain regards. More significantly, the service fabricated the scholarly citations I asked it to include. Of the four sources listed, one contained a real author and title but the wrong publisher, the second borrowed the title of a children’s book and attributed it to the wrong author, and the last two were completely fictional.
It’s important to acknowledge that I set up ChatGPT to fail. Even if the algorithm was less prone to fibbing, no text generated by the current version of ChatGPT is able to accurately attribute its sources. ChatGPT’s text is generated through a predictive algorithm rather than traditional research methods. Its “source” is the full dataset it’s been trained on.
As a reference librarian, I know that errors can crop up anywhere, from social media to scholarly publications. Yet detecting machine-generated text feels different from fact-checking a human. To identify ChatGPT content, I had to combine my reference skills with close-reading techniques that resemble a real-life Turing test. I was lucky that I had been following news of generative AI and was poised to recognize ChatGPT in the wild, but I’ll be honest: I frankly hadn’t expected that, fewer than six months after this technology debuted, it would surface in my daily work.
If you’re interested in honing your AI-detection skills, I suggest visiting the ChatGPT website and trying out the service yourself. Asking the service questions is one of the best ways to become familiar with the characteristics of the text it generates. I also recommend the excellent research guide AI, ChatGPT, and the Library, created by Salt Lake Community College Libraries. The guide details more information about ChatGPT’s inner workings and suggests my favorite research technique, lateral reading, to verify AI-generated information.
As generative AI becomes more ubiquitous, information seekers will face increasingly complex decisions about how to interact with these services. If this prospect feels overwhelming, I like to keep in mind that generative AI isn’t the first new technology to reshape the information landscape (the printing press would like to have a word). For our part, library workers will respond as we always have: by championing information literacy and adapting to the needs of our communities.