- AI search engines are confidently wrong over 60% of the time when citing news, a recent study says. Despite their errors, these bots rarely admitted uncertainty, raising concerns as artificial intelligence-powered search becomes more dominant.
AI search engines have an accuracy issue when it comes to citing news articles.
That’s according to a study from The Tow Center for Digital Journalism at Columbia University, which tested AI products like OpenAI’s ChatGPT Search and Google’s Gemini to assess their ability to accurately cite news articles. The analysis probed eight AI systems and found that, collectively, the bots provided incorrect answers to more than 60% of queries.
The researchers tested the bots with various news articles, manually selecting direct excerpts and asking the chatbots to identify the “corresponding article’s headline, original publisher, publication date, and URL.”
The study found the AI chatbots failed to retrieve the correct articles more than half the time and were generally bad at declining to answer questions they couldn’t answer accurately.
Accuracy varied across platforms, with Perplexity answering 37% of the queries incorrectly, while Grok 3 had a significantly higher error rate, answering 94% of the queries incorrectly.
The researchers said that despite the poor results, the AI bots answered queries with “alarming confidence.” They noted that the bots rarely used any qualifying phrases, such as “it appears,” “it’s possible,” or “might.”
The bots also rarely refused to answer; for example, ChatGPT incorrectly identified 134 articles but signaled a lack of confidence just fifteen times out of two hundred responses, and never declined to provide an answer.
Except for Microsoft’s Copilot—which declined more questions than it answered—all of the tools were consistently more likely to provide an incorrect answer than to acknowledge limitations.
That AI bots are capable of spreading confident misinformation is not a new revelation. In the AI industry, that practice is known as “hallucinating” and happens to varying degrees across all large language models (LLMs). However, the authors note that the recent acceleration of AI search engines such as Google’s AI Overviews makes the issue more pressing.
“While traditional search engines typically operate as an intermediary, guiding users to news websites and other quality content, generative search tools parse and repackage information themselves, cutting off traffic flow to original sources,” the authors wrote. “These chatbots’ conversational outputs often obfuscate serious underlying issues with information quality.”
The study also noted that generative search tools fabricated links and cited syndicated and copied versions of articles.
Representatives for xAI, OpenAI, Google, and Perplexity did not immediately respond to a request for comment from Fortune, made outside normal U.S. working hours.
The rise of AI-powered search
Ever since OpenAI first launched ChatGPT, AI’s significance for the search business has been looming over Big Tech companies.
Microsoft has had an AI-powered version of Bing since early 2023, and late last year, OpenAI rolled out ChatGPT search, in a move seen as the company’s biggest challenge to Google yet.
Not to be outdone, Google is also going all in on AI-powered search.
The company has been pitching AI as the future of search for some time, envisioning a future where Google does the “Googling for you” and spares users the need to visit many websites themselves to answer queries.
Earlier this month, Google announced it was expanding its AI overviews to more people, including teen users, and had begun testing an AI-only search, called AI Mode. Early testers of the experimental product have given it generally favorable reviews.
Some tech companies have tried to establish formal relationships with news publishers to allow their models to cite articles. In February, OpenAI finalized its sixteenth and seventeenth content licensing agreements with the Schibsted and Guardian media groups, respectively. Meanwhile, last year, Perplexity launched its Publishers Program, aimed at fostering collective success, which features a revenue-sharing model for participating publishers.
However, the study found that these content licensing deals did not guarantee accurate citation in chatbot responses.
This story was originally featured on Fortune.com
Recent Comments