Loading

Teaching AI to Speak Luxembourgish: Why It's Harder Than You Think

There are roughly 7,000 languages spoken on Earth. The major AI models have been trained on maybe a few dozen of them well enough to generate fluent, accurate text. Luxembourgish is not one of them.

I know this because I tried to build a language learning app that teaches it.

The benchmark reality

Before building Mynago's Luxembourgish course, I needed to know: how well do current AI models actually handle Lëtzebuergesch?

I ran a series of C2-level Luxembourgish proficiency tests across OpenAI's models. The results:

For context, C2 is the highest proficiency level on the CEFR scale. A human passing C2 would score above 80%. These models are effectively failing the test.

And this isn't a niche complaint. GPT-4o is one of the most capable language models ever built. It handles Japanese, Arabic, and Hindi with reasonable fluency. But ask it to generate a Luxembourgish dialogue about ordering at a Brasserie in Cloche d'Or, and it starts mixing in German words, inventing grammar rules, and producing sentences no Luxembourger would ever say.

GPT-4o-mini is worse. At 40% accuracy, it's essentially guessing. You'd get comparable results by feeding German into a find-and-replace with some Luxembourgish word swaps.

Why Luxembourgish breaks AI

The root cause is training data. Large language models learn from the internet, and Luxembourgish has a tiny digital footprint. Wikipedia has roughly 70,000 articles in Luxembourgish, compared to 6.8 million in English and 2.8 million in German. Reddit, Twitter, news sites, books, forums: the volume of Luxembourgish text available for training is orders of magnitude smaller than any language these models handle well.

This creates a cascade of problems:

The model defaults to German. Luxembourgish is a West Germanic language. When the model encounters a prompt in Luxembourgish, it often "falls back" to German patterns. The output looks plausible to someone who doesn't speak Luxembourgish, but it's wrong. "Ich habe" instead of "Ech hunn". "Bitte" instead of "Wann ech gelift". The grammar is German grammar wearing a Luxembourgish coat.

Spelling is inconsistent. Luxembourgish orthography was only standardized in 1999 (revised from the 1984 version). The model has seen multiple spelling conventions in its training data and doesn't reliably pick the current standard. It might output "Lëtzebuergesch" in one sentence and "Letzebuergesch" in the next.

Idioms are invented. When the model doesn't have enough examples of real Luxembourgish expressions, it generates plausible-sounding phrases that no native speaker would use. This is the most dangerous failure mode for a language learning app, because the student has no way to know the phrase is fake.

The TTS problem

Generating text is only half the challenge. A language learning app needs audio. Students need to hear the language spoken correctly.

Here's the state of Luxembourgish text-to-speech as of early 2026:

Zero. Across every major TTS provider. The fallback most developers would try is German TTS, which pronounces Luxembourgish words letter by letter or with German phonology. It sounds nothing like actual Luxembourgish.

We solved this by building a custom pipeline on ElevenLabs' multilingual v2 model. It's not specifically trained on Luxembourgish either, but its multilingual architecture handles the phonology far better than any Germanic TTS voice. We added a preprocessing layer that expands Luxembourgish abbreviations (w.e.g. to "wann ech gelift", z.B. to "zum Bäispill") before synthesis, and the results are surprisingly natural.

It's not perfect. But it's the first time any app has had usable Luxembourgish audio at all.

What we had to build

Getting Mynago's Luxembourgish course to produce reliable lessons required custom engineering at every layer:

Prompt engineering with guardrails. The AI prompt includes explicit instructions to never fall back to German. It specifies Luxembourgish grammar rules (verb-second word order, "hunn" and "sinn" as auxiliaries, the n-rule for verb endings). Every generated lesson is validated against a set of known German-to-Luxembourgish false friends.

Validation pipeline. After the AI generates a lesson, we run it through checks: Are there German words that shouldn't be there? Is the spelling consistent with the 1999 standard? Do the verb conjugations follow Luxembourgish patterns? Lessons that fail validation get regenerated.

Abbreviation preprocessing for TTS. Luxembourgish has common abbreviations that TTS models don't understand. Our preprocessor expands them before sending text to the speech engine.

Cultural context injection. Luxembourg's trilingual reality (Luxembourgish at home, French in business, German in media) means the AI needs to understand code-switching patterns. A lesson set in a government office might naturally include some French administrative terms. A lesson set at a family dinner should be pure Luxembourgish.

A country of 682,000, infinite complexity

Luxembourgish is often dismissed as "a German dialect" by people who don't speak it. This is both linguistically wrong and culturally insulting. It's a language with its own grammar, its own orthography, its own literary tradition, and its own national identity.

But from an engineering perspective, it's an underserved language. The tools that exist for Spanish, French, or even Welsh simply don't exist for Luxembourgish. Every piece of infrastructure had to be built or adapted from scratch.

The result is that Mynago is, as far as we can tell, the only AI-powered language learning app that actually teaches Luxembourgish with real pronunciation, validated grammar, and lessons built for life in Luxembourg.

For a country of 682,000 residents (47% foreign nationals) and the thousands of expats trying to integrate, that matters.

If you're interested in learning Luxembourgish, check out our Luxembourgish course. And if you want to read about the personal side of learning a language nobody expects you to know, I wrote about that experience here.