Boosting Search Results

Every customer who considers using search in their application always thinks about showing some results higher than the others to highlight the importance. Today, we will check when and why such a promotion or boost may be needed and how to implement it.

Solution

Man with TV head

The first question to ask is when you need to promote or boost search results is when you need to do that. There are many cases when a customer may want to show items in search results to a client. These are advertisements, promotions, user-preferred items, and so on. The crucial thing is not to overuse it. If you are going to boost too much irrelevant information, the client may stop using the application completely.

The second question to ask is what you want to boost. You may boost the title of the item, keywords, unique attributes, and so on. It is always a good idea to give an administrator a possibility to pick which parts she wants to boost. This way administrator can tune the search engine behavior to get better results. Another good recommendation is to collect user statistics and preferences, use this information to predict what a user may need, and boost these items.

The third question to ask is when not to use the boost. It is not that easy as you think. A clear and straightforward search relevance strategy is hard to build. Many customers are just trying to throw as many results as they can generate, which is wrong. The best practice here is to slightly change the results based on user preferences or application requirements, check the outcome, collect feedback, and only provide the change afterward. The closer search results to customer expectations, the better.

Overall, boosting is a good tool that can improve user experience and solve many problems. Think ahead, build the search relevance strategy, test it, show the client what she wants, and get a loyal and happy customer.

Implementation

keyboard

Now let us try to boost some specific values using Elasticsearch queries.

We will create an index with two fields: data that we will use to perform primary full-text search and keywords that will be used to boost results. Data will contain standard data from the article: title, description, etc. Keywords will include essential words that are important for search and so have to be additionally boosted.

Here are queries that create such an index with appropriate mapping for these two fields:

curl -X PUT "localhost:9200/boosting-search-results"
curl -X PUT "localhost:9200/boosting-search-results/_mapping" -H 'Content-Type: application/json' -d'
{
  "properties": {
    "data": {
      "type": "text",
      "analyzer": "english"
    },
    "keywords": {
      "type": "text",
      "analyzer": "english"
    }
  }
}
'

Then we will fill it with some test data:

curl -X PUT "localhost:9200/boosting-search-results/_doc/1" -H 'Content-Type: application/json' -d'
{
  "data": "Apple An apple is an edible sweet fruit produced by an apple tree (Malus domestica). Apple trees are cultivated worldwide and are the most widely grown species in the genus Malus.",
  "keywords": "fruit sweet red yellow"
}
'
curl -X PUT "localhost:9200/boosting-search-results/_doc/2" -H 'Content-Type: application/json' -d'
{
  "data": "Orange The orange is the fruit of various citrus species in the family Rutaceae (see list of plants known as orange); it primarily refers to Citrus × sinensis,[1] which is also called sweet orange, to distinguish it from the related Citrus × aurantium, referred to as bitter orange.",
  "keywords": "fruit sour orange"
}
'

Now let us build the query. We will use the bool query with two sections: must for the primary query against data field, and should that contain optional query against keywords field used to boost specific documents. In other words, data will be used to find documents, and keywords will be used to make some of these documents more important. Here is the query that implements that:

curl -X POST "localhost:9200/boosting-search-results/_search?pretty" -H 'Content-Type: application/json' -d'
{
  "query": {
    "bool" : {
      "must": [
        { "match" : { "data" : "sweet" } }
      ],
      "should" : [
        { "match" : { "keywords" : "sweet" } }
      ]
    }
  },
  "_source": false
}
'

Here are the results. Please note that the first document has a score of 0.8548409 while the second one is only 0.16753875. It is happening because of should query that found the word “sweet“ only in the keywords of the first document.

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 0.8548409,
    "hits" : [
      {
        "_index" : "boosting-search-results",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 0.8548409
      },
      {
        "_index" : "boosting-search-results",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 0.16753875
      }
    ]
  }
}

Another good trick we can use is to set the factor (multiplier) of boosting, so the boost by one field may be more important than the boost by another field. Here is the example:

curl -X POST "localhost:9200/boosting-search-results/_search?pretty" -H 'Content-Type: application/json' -d'
{
  "query": {
    "bool" : {
      "must": [
        { "match" : { "data" : "sweet" } }
      ],
      "should" : [
        { "match" : { "keywords" : { "query": "sweet" , "boost": 10 } } }
      ]
    }
  },
  "_source": false
}
'

Here we have set boost factor 10 to the match query, so the should part will be significantly more important than the primary query from the must part. Consequently, the results show a significantly bigger score for the document with an appropriate keyword. It is 6.7487183 with the multiplier against 0.8548409 without the multiplier. The second document does not match the boosted query, so the score is the same.

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 6.7487183,
    "hits" : [
      {
        "_index" : "boosting-search-results",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 6.7487183
      },
      {
        "_index" : "boosting-search-results",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 0.16753875
      }
    ]
  }
}

Previous lesson
<<
Search Autocomplete

Next lesson
Relevance Optimization >>