Healthcare Digital December 2025 | Page 104

progressively lose information about uncommon but important patterns, eventually converging on a narrow, unrealistic representation of the original data.
This has profound implications for the future of AI development. As AI-generated content proliferates across the internet – some experts predict it could comprise the majority of online content within years – future AI models will inevitably encounter increasing amounts of synthetic information in their training data, potentially triggering widespread model collapse.
However, newer research suggests the problem may be less severe than initially feared. Julia Kempe from NYU’ s Center for Data Science and her collaborators, writing in a Medium article about their research, found that whilst“ models trained on synthetic data eventually hit a performance plateau”, the issue can be mitigated through careful data management.
Can governance frameworks keep up? As synthetic data adoption accelerates, governance frameworks are struggling
104 December 2025