×
AI translations threaten Wikipedia’s vulnerable language editions
Written by
Published on
Join our daily newsletter for breaking news, product launches and deals, research breakdowns, and other industry-leading AI coverage
Join Now

AI-generated machine translations have flooded Wikipedia’s smaller language editions with error-riddled content, creating a dangerous feedback loop as AI models train on these flawed pages. The problem is particularly acute for vulnerable languages with few native speakers, where up to 60% of Wikipedia articles are now uncorrected machine translations that could accelerate language extinction rather than preserve these cultural treasures.

The scale of the problem: Machine-translated content has overwhelmed Wikipedia editions in hundreds of lesser-known languages, with devastating accuracy issues.

  • Volunteers working on four African languages estimate that between 40% and 60% of articles in their Wikipedia editions are uncorrected machine translations.
  • More than two-thirds of longer pages in the Inuktitut Wikipedia contain portions created through machine translation.
  • The Greenlandic Wikipedia became so corrupted that its manager deleted almost everything and is now requesting the edition be shut down entirely.

Why AI struggles with vulnerable languages: Machine translation systems perform poorly on languages with limited online text and unique linguistic structures.

  • Google’s research found that translation systems for lower-resourced languages were generally of lower quality, often mistranslating basic nouns including animal names and colors.
  • Greenlandic and most Native American languages use agglutinative structures where single words can express entire sentences, making them poorly suited to most machine translation systems.
  • AI translators produce absurd errors like claiming Canada has only 41 inhabitants or suggesting the Fulfulde word for “harvest” means “fever.”

In plain English: Agglutinative languages work like linguistic building blocks—speakers attach prefixes and suffixes to root words to create complex meanings. For example, a single Greenlandic word might express what English needs an entire sentence to convey, like “the one who is repeatedly going to hunt seals.” This structure confuses AI systems designed for languages like English that rely more on word order and separate words.

The feedback loop effect: Wikipedia serves as a primary training source for AI language models, meaning errors get amplified across the entire AI ecosystem.

  • Wikipedia was estimated to make up more than half the training data for AI models translating some African languages including Malagasy, Yoruba, and Shona.
  • For 27 under-resourced languages, Wikipedia was the sole easily accessible source of online linguistic data available for AI training.
  • This creates a “garbage in, garbage out” cycle where poorly translated Wikipedia pages poison the data wells that future AI models draw from.

Real-world consequences: The proliferation of AI-generated errors is already harming language learning and preservation efforts.

  • Error-strewn AI-generated books for learning languages like Inuktitut, Cree, and Manx are now appearing for sale on Amazon.
  • Abdulkadir Abdulkadir, an agricultural planner, warns that machine-translated farming information in Fulfulde could “easily harm” farmers who rely on accurate seasonal guidance.
  • Noah Ha’alilio Solomon, a Hawaiian language professor at the University of Hawai’i, reports that 35% of words on some Hawaiian Wikipedia pages are incomprehensible.

What language advocates are saying: Community leaders describe the situation as culturally devastating and potentially accelerating language extinction.

  • “It is painful, because it reminds us of all the times that our culture and language has been appropriated,” says Solomon about poor Hawaiian content on Wikipedia.
  • Abdulkadir predicts a bleak future for Fulfulde: “It is going to be terrible, honestly. Totally, completely no future.”
  • Kenneth Wehr, who managed Greenlandic Wikipedia, concluded: “There is nobody in Greenland who is interested in this, or who wants to contribute. There is completely no point in it.”

The exception that proves the rule: Inari Saami demonstrates how careful community management can make Wikipedia work for endangered languages.

  • This Finnish language went from four child speakers to several hundred speakers over four decades.
  • The community created 6,400 Wikipedia articles, each copy-edited by fluent speakers, with quality prioritized over quantity.
  • Wikipedia has been integrated into Inari Saami school curricula and helps introduce new vocabulary for modern concepts.

Platform responsibility questions: The Wikimedia Foundation, which operates Wikipedia, maintains that individual language communities bear responsibility for content quality.

  • “Ultimately, the responsibility really lies with the community to see that there is no vandalism or unwanted activity, whether through machine translation or other means,” explains senior director Runa Bhattacharjee.
  • But many vulnerable language editions lack active communities to monitor and correct problematic content.
  • The foundation’s approach is to maintain platforms “in case someone comes along to revive” dormant editions.

The race against time: Linguists suggest that creating high-quality content quickly might be the only way to break the negative feedback loop.

  • According to UNESCO, a language becomes extinct every two weeks.
  • “ChatGPT only needs a lot of words,” notes Fabrizio Brecciaroli from the Inari Saami Language Association. “If we keep putting good material in, then sooner or later, we will get something out.”
  • However, the damage may already be embedded in major AI systems—neither Google Translate nor ChatGPT can correctly count to 10 in Greenlandic.
How AI and Wikipedia have sent vulnerable languages into a doom spiral

Recent News

Pennsylvania lawmaker proposes ban on AI as primary school instructor

Former teacher Rivera warns against ceding educational control to chatbots and computer programs.

AI translations threaten Wikipedia’s vulnerable language editions

Greenlandic Wikipedia became so corrupted its manager wants it shut down entirely.

Swiss startup Corintis raises $24M to cool AI chips from the inside

The company's liquid channels run directly through chip circuits, not just surface cooling.