Stop-words
The words which are generally filtered out before processing a natural language are called stop words. These are actually the most common words in any language (like articles, prepositions, pronouns, conjunctions, etc) and does not add much information to the text. Examples of a few stop words in English are “the”, “a”, “an”, “so”, “what”.
Orama automatically removes common stop-words for you, depending on the language
parameter used during new instance creation.
As for now, Orama supports 12 languages when it comes to stop-words removal:
- English (default)
- Italian
- French
- Spanish
- Portugaise
- Dutch
- Swedish
- Russian
- Norwegian
- German
- Danish
- Finnish
Disabling stop-words removal
By default, stopWords
is true
but you can disable stop-words removal by setting stopWords: false
when creating a new Orama instance:
import { create } from '@orama/orama'
const db = await create({
schema: {
author: 'string',
quote: 'string',
},
components: {
tokenizer: {
stopWords: false,
}
}
})
Customizing stop-words
You can interact with the default Orama stop-words by using the built-in stopWords
property when creating a new Orama instance:
import { create } from '@orama/orama'
const db = await create({
schema: {
author: 'string',
quote: 'string',
},
components: {
tokenizer: {
// You can provide an array of stop-words or a function returning an array
stopWords: defaultStopWords => [...defaultStopWords, 'foo', 'bar'],
}
}
})