AI Is Now Hallucinating Its Own Research. The Scientists Reviewing It Can't Tell the Difference.
Here’s a fun thought experiment: What happens when the field that builds AI uses AI to write its own research papers, and the AI invents sources that don’t exist, and the scientists tasked with catching this kind of thing don’t catch it?
You don’t have to imagine it. It already happened. At ICLR 2026 — the International Conference on Learning Representations, one of the most prestigious machine learning venues on the planet.
GPTZero, the AI detection company, scanned just 300 of the roughly 20,000 papers submitted to ICLR 2026. In that small sample, they found over 50 submissions containing at least one hallucinated citation — a reference to a paper, author, or journal that simply does not exist. Each of those submissions had already been reviewed by three to five expert peer reviewers. The reviewers approved them anyway.
Some of these papers had average ratings of 8 out of 10. They were on track to be accepted and published as legitimate scientific contributions to the field of artificial intelligence. With fake sources.
WHAT “VIBE CITING” LOOKS LIKE
GPTZero’s Head of Machine Learning, Alex Adams, coined a term for this phenomenon: “vibe citing.” It’s the citation equivalent of vibe coding — you let the AI handle the details, and the details turn out to be fictional.
The hallucinated citations aren’t random gibberish. That would be easy to catch. Instead, they’re plausible. The AI blends elements from real papers — real-sounding author names, believable titles, legitimate-seeming journals — into references that look right at a glance but crumble under the slightest verification. In some cases, the model starts from a real paper and subtly alters it: expanding an author’s initials into a guessed first name, adding coauthors who don’t exist, or paraphrasing the title just enough to make it unfindable.
Some examples were less subtle. One paper listed authors as “John Doe and Jane Smith.” Another included arXiv IDs formatted as “arXiv:2305.XXXX” — literal placeholder text that made it through peer review. A third contained DOIs and URLs that led absolutely nowhere.
These weren’t edge cases buried in obscure workshops. These were submissions to one of the top five machine learning conferences in the world.
📋 DISASTER DOSSIER
Date of Incident: February–March 2026 (ICLR 2026 review cycle) Victim: Scientific integrity, peer review, everyone who reads research papers Tool Responsible: Large language models used to draft academic papers Sample Size: 300 of ~20,000 ICLR 2026 submissions Hallucinated Papers Found: 50+ submissions with fabricated citations Reviewers Per Paper: 3–5 expert scientists Reviewers Who Caught It: Approximately zero Discovery Method: GPTZero’s automated Hallucination Check tool Best Example of the Problem: A paper listing “John Doe and Jane Smith” as cited authors AI Villain Level: 🤖🤖🤖🤖 (Recursive — the AI is hallucinating about AI research)
THIS ISN’T NEW. IT’S GETTING WORSE.
Before ICLR, GPTZero ran the same analysis on NeurIPS 2025 — another top-tier AI conference. Out of 4,841 accepted papers, they found over 100 confirmed hallucinated citations spread across 51 papers. These weren’t submissions that got rejected. These were papers that passed peer review, beat a 24.52% acceptance rate, and were published as accepted research.
The pattern is clear: researchers are using LLMs to draft papers, the LLMs are inventing citations, and the peer review system — already strained to its breaking point — is not catching them. The fake references are too polished, the volume of submissions too high, and the reviewers too overworked to verify every source in every paper.
ICLR has since hired GPTZero to check all 20,000 submissions for fabricated citations. The fact that this is now a line item in a conference budget tells you everything about where academic publishing is headed.
THE IRONY IS NOT LOST ON ANYONE
Let’s pause to appreciate the full absurdity of the situation: the world’s leading AI researchers are submitting papers about artificial intelligence that contain artificial citations generated by artificial intelligence, and the human experts reviewing them cannot distinguish the real from the fake.
This is the hallucination problem eating its own tail. The field that is supposed to be solving hallucinations is being undermined by hallucinations. The peer review system that is supposed to be the gold standard for scientific rigor is failing to catch errors that an automated tool can flag in seconds.
THE COUNTERARGUMENT (AND WHY IT’S WEAK)
NeurIPS pushed back on the findings, noting that 100 hallucinated citations across 51 papers out of nearly 5,000 is statistically small. And they have a point — numerically, it’s roughly 1% of accepted papers. Some researchers argued that authors might have given an LLM a partial description of a citation and asked it to generate BibTeX, which isn’t quite the same as fabricating a source wholesale.
But this defense misses the forest for the trees. The issue isn’t the percentage. It’s that any fabricated citations made it through a process specifically designed to catch errors. If peer reviewers can’t spot “John Doe and Jane Smith” or “arXiv:2305.XXXX,” what else are they missing? And if the current detection rate is only based on scanning 300 out of 20,000 papers, how many more are hiding in the other 19,700?
There’s also a false-positive concern: GPTZero’s tool claims 99% accuracy, but applied to 20,000 submissions, that could flag 200 innocent papers. That’s a real problem — one that risks punishing honest researchers. But it’s a problem created by the people who used LLMs to write their papers in the first place.
LESSONS FOR THE REST OF US
- “Vibe citing” is the new “vibe coding.” Both produce output that looks professional and falls apart the moment someone actually checks. The difference is that vibe coding crashes your app. Vibe citing corrupts the scientific record.
- Peer review was already broken. AI made it worse. Reviewers were overworked before LLMs flooded conferences with submissions. Now they’re expected to catch AI-generated fabrications on top of everything else. The system was not built for this.
- If AI can fool AI researchers, it can fool anyone. These aren’t laypeople. These are the people who build the models. If they can’t spot hallucinated citations in their own field, nobody in medicine, law, or policy stands a chance without automated tools.
- The academic incentive structure is the real villain. Publish or perish meets generate or perish. As long as career advancement depends on paper volume, researchers will use every shortcut available — including ones that invent sources.
Sources: GPTZero ICLR 2026 report (gptzero.me), BetaKit, TechCrunch, Fortune, The Register, The Decoder, NeurIPS official response. John Doe and Jane Smith were, as always, unavailable for comment.