Three Indian American researchers at the University of Buffalo are developing a tool to detect AI-generated radiology reports, addressing concerns over falsified medical documentation and fraudulent insurance claims.
In an effort to combat the rising threat of falsified medical documentation and bogus insurance claims, a team of researchers from the University of Buffalo (UB) is developing a tool to identify AI-generated radiology reports. This initiative comes in response to the potential dangers posed by AI-generated medical reports, which can impersonate doctors or fabricate injuries in X-ray images, leading to significant issues within the medical and insurance sectors.
The UB team, led by Nalini Ratha, PhD, a SUNY Empire Innovation Professor in the Department of Computer Science and Engineering, believes they have created the first AI system specifically designed to differentiate between radiology reports authored by humans and those generated by artificial intelligence. “With generative AI becoming more capable of producing remarkably convincing radiology reports, there’s a greater risk of fabricated reports being used to falsify medical histories and support fraudulent claims,” Ratha explained.
Ratha emphasized the unique challenges posed by radiology reports, which possess a highly specialized structure, vocabulary, and stylistic norms that make general-purpose detection systems unreliable. “Therefore, our goal was to build a detection framework designed specifically for radiology that can distinguish clinician-written medical documentation from synthetic text before it reaches clinical or insurance workflows,” she added.
The research team, which includes PhD students Arjun Ramesh Kaushik and Tanvi Ranga, presented their findings in a study titled “Detecting Synthetic Radiology Reports Using Style Disentanglement” at the 2025 GenAI4Health workshop during the Conference on Neural Information Processing Systems held in San Diego in December.
As part of their research, the team compiled a dataset comprising 14,000 pairs of radiologist-authored and AI-generated chest X-ray reports. They employed two distinct methods to create the synthetic reports: paraphrasing actual radiologist reports using advanced large language models (LLMs) and generating complete reports directly from chest radiographs using medical vision-language models (VLMs).
This dataset is notable for being the first to integrate both text-based and image-based synthetic radiology reports, marking a significant advancement for trustworthy AI research in healthcare. The samples focused specifically on the findings section of the reports, which captures the radiologist’s detailed analysis and includes extensive domain-specific terminology and descriptive language.
“The findings section is both central to authorship attribution and the one most susceptible to exploitation,” Ratha noted.
The subsequent phase of their study involved developing an authorship-detection framework tailored to operate on this dataset. Although LLMs can replicate clinical terminology, they often struggle to mimic the stylistic characteristics inherent in human-authored radiology reports.
Recognizing this gap, the UB researchers devised a detection model based on BERT–Mamba technology, designed to separate each report’s stylistic features from its underlying clinical content. Their model demonstrated high accuracy and consistency, achieving Matthews correlation coefficient (MCC) scores ranging from 92% to 100% across both text-to-text and image-to-text categories. Furthermore, the framework proved effective in cross-LLM tests, accurately identifying AI-generated reports from models it had not previously encountered.
“What we found is that LLMs tend to write in polished, expansive language, while clinicians prefer concise, direct terms. For instance, radiologists use straightforward terms like ‘heart’ or ‘lung,’ whereas LLMs often opt for more elaborate phrases like ‘pulmonary vasculature.’ This distinction became a clear stylistic signal that our model learned to recognize,” Ranga explained.
Despite the promising results, the research team plans to continue refining both the dataset and the benchmark detection model in preparation for public release. They also envision that as AI systems become increasingly sophisticated and tailored to specific fields like radiology, these tools could significantly alleviate the workload for radiologists.
While the focus of their research is on radiology, Ratha believes the implications extend beyond healthcare. The style-based detection approach developed by the team could also be beneficial in safeguarding industries that are increasingly vulnerable to AI-generated forgeries, fabricated records, and synthetic narratives, including insurance, finance, journalism, education, and the legal profession.
According to The American Bazaar, this innovative research highlights the critical need for reliable detection methods as AI technology continues to evolve and integrate into various sectors.

