Healthcare Digital December 2025 | Page 102

GENERATIVE AI
“ The pervasiveness of generative AI has made it easier to generate vast amounts of content quickly,” she says,“ outpacing traditional measures to identify and manage source material.”
The risk of synthetic data model collapse Perhaps the most significant concern facing synthetic data is“ model collapse”: a phenomenon where AI models trained on synthetic data from previous generations gradually degrade until they produce nonsensical outputs.
The problem stems from a fundamental characteristic of AI systems: they excel at reproducing patterns they’ ve seen but struggle with true creativity. When an AI model generates synthetic data, it tends to emphasise common patterns whilst losing subtle details and rare occurrences: what statisticians call the‘ tails’ of data distributions.
Research published in Nature demonstrated this effect dramatically. Researchers fine-tuned language models on Wikipedia articles, then used those models to generate synthetic text, which was fed back into training new models. After nine iterations of this process, the AI was producing pure gibberish instead of coherent text.
The study found that“ indiscriminate use of model-generated content in training causes irreversible defects in the resulting models”. As models train on their predecessors’ outputs, they
102 December 2025