Text Analysis
Stop Words

Stop-words

The words which are generally filtered out before processing a natural language are called stop words. These are actually the most common words in any language (like articles, prepositions, pronouns, conjunctions, etc) and does not add much information to the text. Examples of a few stop words in English are “the”, “a”, “an”, “so”, “what”.

Orama automatically removes common stop-words for you, depending on the language parameter used during new instance creation.

As for now, Orama supports 12 languages when it comes to stop-words removal:

  • English (default)
  • Italian
  • French
  • Spanish
  • Portugaise
  • Dutch
  • Swedish
  • Russian
  • Norwegian
  • German
  • Danish
  • Finnish

Disabling stop-words removal

By default, stopWords is true but you can disable stop-words removal by setting stopWords: false when creating a new Orama instance:

import { create } from '@orama/orama'
 
const db = await create({
  schema: {
    author: 'string',
    quote: 'string',
  },
  components: {
    tokenizer: {
      stopWords: false,
    }
  }
})

Customizing stop-words

You can interact with the default Orama stop-words by using the built-in stopWords property when creating a new Orama instance:

import { create } from '@orama/orama'
 
const db = await create({
  schema: {
    author: 'string',
    quote: 'string',
  },
  components: {
    tokenizer: {
      // You can provide an array of stop-words or a function returning an array
      stopWords: defaultStopWords => [...defaultStopWords, 'foo', 'bar'],
    }
  }
})