Discover

Why AI like ChatGPT still quotes retracted papers?

Oct 10, 2025 |

16 Mins Read

AI models like ChatGPT are trained on massive datasets collected at specific moments in time, which means they lack awareness of papers retracted after their training cutoff. When a scientific paper gets retracted, whether due to errors, fraud, or ethical violations, most AI systems continue referencing it as if nothing happened. This creates a troubling scenario where researchers using AI assistants might unknowingly build their work on discredited foundations.

In other words: retracted papers are the academic world's way of saying "we got this wrong, please disregard." Yet the AI tools designed to help us navigate research faster often can't tell the difference between solid science and work that's been officially debunked.

ChatGPT and other assistants tested

Recent studies examined how popular AI research tools handle retracted papers, and the results were concerning. Researchers tested ChatGPT, Google's Gemini, and similar language models by asking them about known retracted papers. In many cases, they not only failed to flag the retractions but actively praised the withdrawn studies.

One investigation found that ChatGPT referenced retracted cancer imaging research without any warning to users, presenting the flawed findings as credible. The problem extends beyond chatbots to AI-powered literature review tools that researchers increasingly rely on for efficiency.

Common failure scenarios

The risks show up across different domains, each with its own consequences:

Medical guidance: Healthcare professionals consulting AI for clinical information might receive recommendations based on studies withdrawn for data fabrication or patient safety concerns
Literature reviews: Academic researchers face citation issues when AI assistants suggest retracted papers, damaging credibility and delaying peer review
Policy decisions: Institutional leaders making evidence-based choices might rely on AI-summarised research without realising the underlying studies have been retracted

A doctor asking about treatment protocols could unknowingly follow advice rooted in discredited research. Meanwhile, detecting retracted citations manually across hundreds of references proves nearly impossible for most researchers.

How Often Retractions Slip Into AI Training Data

The scale of retracted papers entering AI systems is larger than most people realise. Crossref, the scholarly metadata registry that tracks digital object identifiers (DOIs) for academic publications, reports thousands of retraction notices annually. Yet many AI models were trained on datasets harvested years ago, capturing papers before retraction notices appeared.

Here's where timing becomes critical. A paper published in 2020 and included in an AI training dataset that same year might get retracted in 2023. If the model hasn't been retrained with updated data, it remains oblivious to the retraction. Some popular language models go years between major training updates, meaning their knowledge of the research landscape grows increasingly outdated.

Lag between retraction and model update

Training Large Language Models requires enormous computational resources and time, which explains why most AI companies don't continuously update their systems. Even when retraining occurs, the process of identifying and removing retracted papers from massive datasets presents technical challenges that many organisations haven't prioritised solving.

The result is a growing gap between the current state of scientific knowledge and what AI assistants "know." You might think AI systems could simply check retraction databases in real-time before responding, but most don't. Instead, they generate responses based solely on their static training data, unaware that some information has been invalidated.

Risks of Citing Retracted Papers in Practice

The consequences of AI-recommended retracted papers extend beyond embarrassment. When flawed research influences decisions, the ripple effects can be substantial and long-lasting.

Clinical decision errors

Healthcare providers increasingly turn to AI tools for quick access to medical literature, especially when facing unfamiliar conditions or emerging treatments. If an AI assistant recommends a retracted study on drug efficacy or surgical techniques, clinicians might implement approaches that have been proven harmful or ineffective. The 2020 hydroxychloroquine controversy illustrated how quickly questionable research spreads. Imagine that dynamic accelerated by AI systems that can't distinguish between valid and retracted papers.

Policy and funding implications

Government agencies and research institutions often use AI tools to synthesise large bodies of literature when making funding decisions or setting research priorities. Basing these high-stakes choices on retracted work wastes resources and potentially misdirects entire fields of inquiry. A withdrawn climate study or economic analysis could influence policy for years before anyone discovers the AI-assisted review included discredited research.

Academic reputation damage

For individual researchers, citing retracted papers carries professional consequences. Journals may reject manuscripts, tenure committees question research rigour, and collaborators lose confidence. While honest mistakes happen, the frequency of such errors increases when researchers rely on AI tools that lack retraction awareness, and the responsibility still falls on the researcher, not the AI.

Why Language Models Miss Retraction Signals

The technical architecture of most AI research assistants makes them inherently vulnerable to the retraction problem. Understanding why helps explain what solutions might actually work.

Corpus quality controls lacking

AI models learn from their training corpus, the massive collection of text they analyse during development. Most organisations building these models prioritise breadth over curation, scraping academic databases, preprint servers, and publisher websites without rigorous quality checks.

The assumption is that more data produces better models, but this approach treats all papers equally regardless of retraction status. Even when training data includes retraction notices, the AI might not recognise them as signals to discount the paper's content. A retraction notice is just another piece of text unless the model has been specifically trained to understand its significance.

Sparse or inconsistent metadata

Publishers handle retractions differently, creating inconsistencies that confuse automated systems:

Some journals add "RETRACTED" to article titles
Others publish separate retraction notices
A few quietly remove papers entirely

This lack of standardisation means AI systems trained to recognise one retraction format might miss others completely. Metadata، the structured information describing each paper, often fails to consistently flag retraction status across databases. A paper retracted in PubMed might still appear without warning in other indexes that AI training pipelines access.

Hallucination and overconfidence

AI hallucination occurs when models generate plausible-sounding but false information, and it exacerbates the retraction problem. Even if a model has no information about a topic, it might confidently fabricate citations or misremember details from its training data. This overconfidence means AI assistants rarely express uncertainty about the papers they recommend, leaving users with no indication that additional verification is needed.

Real-Time Retraction Data Sources Researchers Should Trust

While AI tools struggle with retractions, several authoritative databases exist for manual verification. Researchers concerned about citation integrity can cross-reference their sources against these resources.

Retraction Watch Database

Retraction Watch operates as an independent watchdog, tracking retractions across all academic disciplines and publishers. Their freely accessible database includes detailed explanations of why papers were withdrawn, from honest error to fraud. The organisation's blog also provides context about patterns in retractions and systemic issues in scholarly publishing.

Crossref metadata service

Crossref maintains the infrastructure that assigns DOIs to scholarly works, and publishers report retractions through this system. While coverage depends on publishers properly flagging retractions, Crossref offers a comprehensive view across multiple disciplines and publication types. Their API allows developers to build tools that automatically check retraction status, a capability that forward-thinking platforms are beginning to implement.

PubMed retracted publication tag

For medical and life sciences research, PubMed provides reliable retraction flagging with daily updates. The National Library of Medicine maintains this database with rigorous quality control, ensuring retracted papers receive prominent warning labels. However, this coverage is limited to biomedical literature, leaving researchers in other fields without equivalent resources.

Database	Coverage	Update Speed	Access
Retraction Watch	All disciplines	Real-time	Free
Crossref	Publisher-reported	Variable	Free API
PubMed	Medical/life sciences	Daily	Free

Responsible AI Starts with Licensing

When AI systems access research papers, articles, or datasets, authors and publishers have legal and ethical rights that need protection. Ignoring these rights can undermine the sustainability of the research ecosystem and diminish trust between researchers and technology providers.

One of the biggest reasons AI tools get it wrong is that they often cite retracted papers as if they’re still valid. When an article is retracted, e.g. due to peer review process not being conducted properly or failing to meet established standards, most AI systems don’t know, it simply remains part of their training data. This is where licensing plays a crucial role. Licensed data ensures that AI systems are connected to the right sources, continuously updated with accurate, publisher-verified information. It’s the foundation for what platforms like Zendy aim to achieve: making sure the content is clean and trustworthy.

Licensing ensures that content is used responsibly. Proper agreements between AI companies and copyright holders allow AI systems to access material legally while providing attribution and, when appropriate, compensation. This is especially important when AI tools generate insights or summaries that are distributed at scale, potentially creating value for commercial platforms without benefiting the sources of the content.

in conclusion, consent-driven licensing helps build trust. Publishers and authors can choose whether and how their work is incorporated into AI systems, ensuring that content is included only when rights are respected. Advanced AI platforms, such as Zendy, can even track which licensed sources contributed to a particular output, providing accountability and a foundation for equitable revenue sharing.

Research Integrity, Partnership, and Societal Impact

Research integrity extends beyond publication to include how scholarship is discovered, accessed, and used, and its societal impact depends on more than editorial practice alone. In practice, integrity and impact are shaped by a web of platforms and partnerships that determine how research actually travels beyond the press. University press scholarship is generally produced with a clear public purpose, speaking to issues such as education, public health, social policy, culture, and environmental change, and often with the explicit aim of informing practice, policy, and public debate. Whether that aim is realised increasingly depends on what happens to research once it leaves the publishing workflow. Discovery platforms, aggregators, library consortia, and technology providers all influence this journey. Choices about metadata, licensing terms, ranking criteria, or the use of AI-driven summarisation affect which research is surfaced, how it is presented, and who encounters it in the first place. These choices can look technical or commercial on the surface, but they have real intellectual and social consequences. They shape how scholarship is understood and whether it can be trusted beyond core academic audiences. For university presses, this changes where responsibility sits. Editorial quality remains critical, but it is no longer the only consideration. Presses also have a stake in how their content is discovered, contextualised, and applied in wider knowledge ecosystems. Long-form and specialist research is particularly exposed here. When material is compressed or broken apart for speed and scale, nuance can easily be lost, even when the intentions behind the system are positive. This is where partnerships start to matter in a very practical way. The conditions under which presses work with discovery services directly affect whether their scholarship remains identifiable, properly attributed, and anchored in its original context. For readers using research in teaching, healthcare, policy, or development settings, these signals are not decorative. They are essential to responsible use. Zendy offers one example of how these partnerships can function differently. As a discovery and access platform serving researchers, clinicians, and policymakers in emerging and underserved markets, Zendy is built around extending reach without undermining trust. University press content is surfaced with clear attribution, structured metadata, and rights-respecting access models that preserve the integrity of the scholarly record. Zendy works directly with publishers to agree how content is indexed, discovered, and, where appropriate, summarised. This gives presses visibility into and control over how their work appears in AI-supported discovery environments, while helping readers approach research with a clearer sense of scope, limitations, and authority. From a societal impact perspective, this matters. Zendy’s strongest usage is concentrated in regions where access to trusted scholarship has long been uneven, including parts of Africa, the Middle East, and Asia. In these contexts, university press research is not being read simply for academic interest. It is used in classrooms, clinical settings, policy development, and capacity-building efforts, areas closely connected to the Sustainable Development Goals. Governance really sits at the heart of this kind of model. Clear and shared expectations around metadata quality, content provenance, licensing boundaries, and the use of AI are what make the difference between systems that encourage genuine engagement and those that simply amplify visibility without depth. Metadata is not just a technical layer: it gives readers the cues they need to understand what they are reading, where it comes from, and how it should be interpreted. AI-driven discovery and new access models create real opportunities to broaden the reach of university press publishing and to connect trusted scholarship with communities that would otherwise struggle to access it. But reach on its own does not equate to impact. When context and attribution are lost, the value of the research is diminished. Societal impact depends on whether work is understood and used with care, not simply on how widely it circulates. For presses with a public-interest mission, active participation in partnerships like these is a way to carry their values into a more complex and fast-moving environment. As scholarship is increasingly routed through global, AI-powered discovery systems, questions of integrity, access, and societal relevance converge. Making progress on shared global challenges requires collaboration, shared responsibility, and deliberate choices about the infrastructures that connect research to the wider world. For university presses, this is not a departure from their mission, but a continuation of it, with partnerships playing an essential role. FAQ How do platforms and partnerships affect research integrity?Discovery platforms, aggregators, and technology partners influence which research is surfaced, how it’s presented, and who can access it. Choices around metadata, licensing, and AI summarization directly impact understanding and trust. Why are university press partnerships important?Partnerships allow presses to maintain attribution, context, and control over their content in discovery systems, ensuring that research remains trustworthy and properly interpreted. How does Zendy support presses and researchers?Zendy works with publishers to surface research with clear attribution, structured metadata, and rights-respecting access, preserving integrity while extending reach to underserved regions. For partnership inquiries, please contact: Sara Crowley Vigneau Partnership Relations Manager Email: s.crowleyvigneau@zendy.io .wp-block-image img { max-width: 65% !important; margin-left: auto !important; margin-right: auto !important; }

Dec 18, 20256 Mins ReadDiscover

Beyond Publication. Access as a Research Integrity Issue

If research integrity now extends beyond publication to include how scholarship is discovered and used, then access is not a secondary concern. It is foundational. In practice, this broader understanding of integrity quickly runs into a hard constraint: access. A significant percentage of academic publishing is still behind paywalls, and traditional library sales models fail to serve institutions with limited budgetsor uneven digital infrastructure. Even where university libraries exist, access is often delayed or restricted to narrow segments of the scholarly record. The consequences are structural rather than incidental. When researchers and practitioners cannot access the peer-reviewed scholarship they need, it drops out of local research agendas, teaching materials as well as policy conversations. Decisions are then shaped by whatever information is most easily available, not necessarily by what is most rigorous or relevant. Over time, this weakens citation pathways, limits regional participation in scholarly debate, and reinforces global inequity in how knowledge is visible, trusted, and amplified. The ongoing success of shadow libraries highlights this misalignment: Sci-Hub reportedly served over 14 million monthly users in 2025, indicating sustained and widespread demand for academic research that existing access models continue to leave unmet. This is less about individual behaviour than about a system that consistently fails to deliver essential knowledge where it is needed most. The picture looks different when access barriers are reduced: usage data from open and reduced-barrier initiatives consistently show strong engagement across Asia and Africa, particularly in fields linked to health, education, social policy, and development. These patterns highlight how emerging economies rely on high-quality publishing in contexts where it directly impacts professional practice and public decision-making. From a research integrity perspective, this is important. When authoritative sources are inaccessible, alternative materials step in to fill the gap. The risk is not only exclusion, but distortion. Inconsistent, outdated, or unverified sources become more influential precisely because they are easier to obtain. Misinformation takes hold most easily where trusted knowledge is hardest to reach. Addressing access is about more than widening readership or improving visibility, it is about ensuring that high-quality scholarship can continue to shape understanding and decisions in the contexts it seeks to serve. For university presses committed to the public good, this challenge sits across discovery systems, licensing structures, technology platforms, and the partnerships that increasingly determine how research is distributed, interpreted, and reused. If research integrity now extends across the full lifecycle of scholarship, then sustaining it requires collective responsibility and shared frameworks. How presses engage with partners, infrastructures, and governance mechanisms becomes central to protecting both trust and impact. FAQ: What challenges exist in current access models?Many academic works remain behind paywalls, libraries face budget and infrastructure constraints, and access delays or restrictions can prevent researchers from using peer-reviewed scholarship effectively. What happens when research is inaccessible?When trusted sources are hard to reach, alternative, inconsistent, or outdated materials often fill the gap, increasing the risk of misinformation and weakening citation pathways. How does Zendy help address access challenges?Zendy provides affordable and streamlined access to high-quality research, helping scholars, practitioners, and institutions discover and use knowledge without traditional barriers. For partnership inquiries, please contact:Sara Crowley VigneauPartnership Relations ManagerEmail:s.crowleyvigneau@zendy.io .wp-block-image img { max-width: 65% !important; margin-left: auto !important; margin-right: auto !important; }

Dec 18, 20255 Mins ReadDiscover

Beyond Peer Review. Research Integrity in University Press Publishing

University presses play a distinctive role in advancing research integrity and societal impact. Their publishing programmes are closely aligned with public-interest research in the humanities, social sciences, global health, education, and environmental studies, disciplines that directly inform policy and progress toward the UN Sustainable Development Goals. This work typically prioritises depth, context, and long-term understanding, often drawing on regional expertise and interdisciplinary approaches rather than metrics-driven outputs. Research integrity is traditionally discussed in terms of editorial rigour, peer review, and ethical standards in the production of scholarship. These remain essential. But in an era shaped by digital platforms and AI-led discovery, they are no longer sufficient on their own. Integrity now also depends on what happens after publication: how research is surfaced, interpreted, reduced, and reused. For university presses, this shift is particularly significant. Long-form scholarship, a core strength of press programmes, is increasingly encountered through abstracts, summaries, extracts, and automated recommendations rather than sustained reading. As AI tools mediate more first encounters with research, meaning can be subtly altered through selection, compression, or loss of context. These processes are rarely neutral. They encode assumptions about relevance, authority, and value. This raises new integrity questions. Who decides which parts of a work are highlighted or omitted? How are disciplinary nuance and authorial intent preserved when scholarship is summarised? What signals remain to help readers understand scope, limitations, or evidentiary weight? This isn’t to say that AI-driven discovery is inherently harmful, but it does require careful oversight. If university press scholarship is to continue informing research, policy, and public debate in meaningful ways, it needs to remain identifiable, properly attributed, and grounded in its original framing as it moves through increasingly automated discovery systems. In this context, research integrity extends beyond how scholarship is produced to include how it is processed, surfaced and understood. For presses with a public-interest mission, research integrity now extends across the full journey of a work, from how it is published to how it is discovered, interpreted and used. FAQ Can Zendy help with AI-mediated research discovery?Yes. Zendy’s tools help surface, summarise, and interpret research accurately, preserving context and authorial intent even when AI recommendations are used. Does AI discovery harm research, or can it be beneficial?AI discovery isn’t inherently harmful—it can increase visibility and accessibility. However, responsible use is essential to prevent misinterpretation or loss of nuance, ensuring research continues to inform policy and public debate accurately. How does Zendy make research more accessible?Researchers can explore work from multiple disciplines, including humanities, social sciences, global health, and environmental studies, all in one platform with easy search and AI-powered insights. For partnership inquiries, please contact:Sara Crowley Vigneau Partnership Relations Manager Email: s.crowleyvigneau@zendy.io .wp-block-image img { max-width: 65% !important; margin-left: auto !important; margin-right: auto !important; }

Return to blogs

Accelerating Research