Study Finds Chinese State Media Content Is Embedded in AI Training Data

New research suggests that content from Chinese state media is deeply embedded in the datasets used to train major artificial intelligence (AI) systems and may be subtly shaping how some models respond to politically sensitive questions.
A study published in the scientific journal Nature on May 13 found that large volumes of material from Chinese state outlets—including Xinhua News Agency and People’s Daily—appear in the training datasets of large language models.
According to the research, when prompted in Chinese on topics related to China’s political system or sensitive domestic issues, several leading AI systems—including ChatGPT, Claude, and Gemini—were more likely to generate responses that closely aligned with the official Chinese regime’s framing. English-language responses to the same questions, the study found, often differed in tone or emphasis…. 

Read More

Leave a Reply