
The answer is no, at least not yet.
AI tools help automate literature screening, data extraction and evidence synthesis, but they remain just that, tools, and cannot replace the expert using them. The difference matters when the output feeds a regulatory dossier, an HTA committee presentation or a public health policy decision.
Language models write fluently and can produce text that reads as authoritative even when the underlying reasoning is wrong or the cited evidence does not even exist. In a systematic review, this is not a minor issue.
A PICO question does not exist in isolation. The relevance of a given study depends on the therapeutic area, the regulatory context, the existing evidence base and, sometimes, the track record of a particular health authority. A LLM (Large Language Model) cannot weigh those factors (yet). A senior epidemiologist who has worked on EMA submissions or NICE appraisals can. Experienced reviewers can spot immediately that randomisation was inadequate, that the comparator was poorly chosen, or that follow-up was too short to capture the outcome of interest. That is not pattern recognition. It is years of clinical and methodological experience applied to a specific scientific question.
This is particularly true in pharmacoepidemiology and real-world evidence reviews, where study designs vary widely and confounding is a constant challenge.
AI tools do add genuine value for searching the literature, screening titles and abstracts, and extracting data. Tasks that used to take weeks now take days, translating into a great efficiency gain that reduces costs for the contractors. However, expert judgment cannot be skipped for risk of bias assessment, evidence synthesis and interpretation.
AI and human expertise are not in competition, they need to work together. The key is knowing which decisions require a person, and which ones do not.
© epiSphera, 2026. Licensed under CC BY 4.0.
