BM25 Algorithm

BM25 Algorithm

Orama uses the BM25 (opens in a new tab) algorithm to calculate the relevance of a document when searching.

The BM25 algorithm is a ranking function used in search engines to score and rank documents that are relevant to a given query. It is an improvement over the older TF-IDF algorithm, which also assigns weights to terms based on their frequency, but does not take into account the length of the document or the average length of documents in the corpus. BM25 uses a similar approach, but also incorporates the inverse document frequency of each term, as well as a set of adjustable parameters that can be tuned to improve performance. The result is a more accurate ranking of documents that are relevant to a given query.

BM25 has become a popular algorithm for search engine ranking due to its flexibility and effectiveness. It can be adapted to different types of search tasks, from ad-hoc search to recommendation systems, and can be tuned to perform well on specific domains or languages. Additionally, it is computationally efficient and easy to implement, which makes it a practical choice for large-scale search systems.

You can edit the BM25 parameters by using the relevance property in the search configuration object.

const searchResult = await search(movieDB, {
  term: 'Chris',
  properties: ['director'],
  relevance: {
    // Term frequency saturation parameter.
    // Default value: 1.2
    // Recommended value: between 1.2 and 2
    k: 1.2,
    // Length normalization parameter.
    // Default value: 0.75
    // Recommended value: > 0.75
    b: 0.75,
    // Frequency normalization lower bound.
    // Default value: 0.5
    // Recommended value: between 0.5 and 1
    d: 0.5,

You can learn more about the BM25 algorithm in the Okapi BM25 Wikipedia (opens in a new tab) page.