AI tools are making ‘repeated factual errors’, major new research warns

The latest wave of internet-based AI search tools “often make mistakes, misread information and even give risky advice,” according to a damning study by Which?
Which one? The team surveyed 4,189 adults in the UK in September 2025 about their AI habits and found that nearly a third of them believed AI searches were more important to them than standard web searching.
Additionally, nearly half of respondents said they had a “great deal” or “fair” degree of confidence in the information they received from AI search engines, with this share rising to two-thirds among frequent users.
The team tested six AI tools: ChatGPT, Google Gemini (both Gemini and Gemini AI overviews, or AIO in standard Google searches), Microsoft’s Copilot, Meta AI, and Perplexity.
Each AI engine was asked 40 common questions about human concerns such as money and finance, law, health/nutrition, and consumer rights/travel issues. Which experts then evaluated the answers? They were rated on factors such as accuracy, usefulness, and ethical responsibility.
The team said all the AI tools in testing made “repeated factual errors”, gave incomplete advice, gave overly confident advice without considering ethical issues, sometimes relied on poor sources such as old forum threads, and also directed users to “dangerous premium services” rather than directing them to free tools and resources, meaning people risked overpaying or engaging with “dubious services”.
“There are too many false and misleading statements out there for convenience, especially given how much people use and rely on these devices now,” Which? said the team.
He added: “AI is the future, but relying too much on it right now could be costly.”
The investigation into the reliability of AI comes as Sundar Pichai, CEO of Google parent company Alphabet, said AI models were “error-prone” and encouraged people to use them in conjunction with other tools.
Speaking to the BBC this week Mr Pichai said He states that people should not “blindly trust” new technology and that the mistakes made by AI tools highlight the importance of having a rich information ecosystem rather than relying solely on AI.
Which one do you respond to? A Google spokesperson said in the study: “We have always been transparent about the limitations of Gerative AI, and we place reminders directly in the Gemini app to ensure users double-check information. For sensitive topics, such as legal, medical or financial matters, Gemini goes a step further by recommending that users consult qualified professionals.”
Microsoft said: “Copilot answers questions by distilling information from multiple web sources into a single answer. Answers include linked quotes so users can do further research and research, just like traditional search. With any AI system, we encourage people to verify the accuracy of the content, and we’re committed to listening to feedback to improve our AI technologies.”
An OpenAI spokesperson said: “If you’re using ChatGPT to research consumer products, we recommend choosing the built-in search tool. It shows you where the information is coming from and gives you links so you can check it yourself. Improving accuracy is something the whole industry is working on. We’re making good progress, and our latest default model, GPT-5, is the smartest and most accurate model we’ve developed.”




