Skip to content

Officially Supported Languages

Right now, Orama supports 30 languages out of the box in 8 different alphabets.
For every language, Orama provides a default tokenizer, stop-words, and stemmer.

🇨🇳🇯🇵 A note on Chinese and Japanese

At the time of writing, Chinese (Mandarin) and Japanese are the only exception, since Orama provides everything by default but the stemmer.

Since Chinese and Japanese logograms follow different rules than other alphabets, you will need to import a dedicated tokenizer for it.

Read more here about Chinese here and about Japanese here.

Latin Alphabet

LanguageTokenizerStop-wordsStemmer
Danish
Dutch
English
Finnish
French
German
Hungarian
Indonesian
Irish
Italian
Norwegian
Portuguese
Romanian (*)
Serbian (**)
Slovenian
Spanish
Swedish
Turkish

(*) = also uses a few additional diacritic marks
(**) = uses both Cyrillic and Latin scripts

Cyrillic Alphabet

LanguageTokenizerStop-wordsStemmer
Bulgarian
Russian
Serbian (*)
Ukrainian

(*) = uses both Cyrillic and Latin scripts

Greek Alphabet

LanguageTokenizerStop-wordsStemmer
Greek

Devanagari Script

LanguageTokenizerStop-wordsStemmer
Hindi
Nepali
Sanskrit

Arabic Script

LanguageTokenizerStop-wordsStemmer
Arabic

Armenian Alphabet

LanguageTokenizerStop-wordsStemmer
Armenian

Tamil Script

LanguageTokenizerStop-wordsStemmer
Tamil

Chinese Characters (Logographic Script)

LanguageTokenizerStop-wordsStemmer
Chinese (Mandarin)
Japanese