Tracking User Behavior

Customers often use search engines to boost some documents. They can do that to promote some elements, to have a better relevance, etc. There is one more common use case: a boost of documents based on the user behavior. We will check how to do it in the following article.

Solution

There are tons of user behavior metrics that an application can track. It can be page popularity, how long the user stayed there, did he click on some buttons or links, has he added the page to favorites or promoted in social media, etc. We will not concentrate on the specific cases here. Instead, I will give you some general advice on using this information in the search engine.

The typical approach in most cases is to boost some results based on the global page popularity. The better the page, the more popular it is, and so it may be helpful for other users. Popularity may not only mean page views. Depending on the application type, it may be the number of times it has been shared on social media, how many times a user bought a product in e-commerce, how many users are watching the changes in the ticket, etc.

If you want to boost some results based on the global popularity, it is better to start from some threshold. For example, you can boost only articles with at least 1000 views. You can also use formulas to affect the score as a part of the relevance optimization strategy. Another good trick is to boost the popular page and give a minor boost to related articles as they may be interesting for your audience.

Finally, you can do the same trick per user. In other words, you need to check the specific user's action history and predict the most common pages the one is using, what are one's interests and preferences (e.g., based on tags), and the one may also want to look at. You can put all this information to the search index and then boost specific results for each logged-in user. You may even allow the user to customize his preferences in some cases.

Implementation

Now we are going to implement some of the features described above. We will implement two things:

boost by global popularity;
boost of recommendations per user.

As usual, we are starting by creating a simple index with an appropriate mapping. The mapping contains three fields: title, the global number of views, and who this document is recommended for. The first and the second fields should be clear, while the third will contain an array of user IDs. So, if the current user is on this list, this document will have a boost. Then finally, we will add some test data to demonstrate how this logic works.

curl -X PUT "localhost:9200/tracking-user-behavior"
curl -X PUT "localhost:9200/tracking-user-behavior/_mapping" -H 'Content-Type: application/json' -d'
{
  "properties": {
    "title": {
      "type": "text",
      "analyzer": "english"
    },
    "views": {
      "type": "integer"
    },
    "recommended_for": {
      "type": "integer"
    }
  }
}
'
curl -X PUT "localhost:9200/tracking-user-behavior/_doc/1" -H 'Content-Type: application/json' -d'
{
  "title": "Apple",
  "views": 25,
  "recommended_for": [1, 2, 3]
}
'
curl -X PUT "localhost:9200/tracking-user-behavior/_doc/2" -H 'Content-Type: application/json' -d'
{
  "title": "Orange",
  "views": 220,
  "recommended_for": [3, 4, 5]
}
'
curl -X PUT "localhost:9200/tracking-user-behavior/_doc/3" -H 'Content-Type: application/json' -d'
{
  "title": "Banana",
  "views": 120,
  "recommended_for": [5, 6, 7]
}
'

Now let us build a query that will apply both boosts by the global view and personal recommendation. I will use match_all query just as a placeholder for a better picture. You can replace it with the match query of any other full-text search. Then you see two additional boosts in the should part of the bool query. The first one sets a boost with a factor of 5.0 to all documents with more than 200 views. The second one adds a boost with a factor of 10 to all documents recommended for the current user with ID=7.

curl -X GET "localhost:9200/tracking-user-behavior/_search?pretty" -H 'Content-Type: application/json' -d'
{
  "query": {
    "bool" : {
      "must" : {
        "match_all": {}
      },
      "should" : [
        { "range" : { "views" : { "gte" : 200, "boost" : 5.0 } } },
        { "term" : { "recommended_for" : { "value" : 7, "boost" : 10.0 } } }
      ]
    }
  },
  "_source": false
}
'

Here are the results. As you can see, the recommended document ( _id=3) has the biggest score of 11.0, then the document with 200+ views ( _id=2) has a score of 6.0. The last document ( _id=1) does not have any boosts, so it has the default score of 1.0.

{
  "took" : 37,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 3,
      "relation" : "eq"
    },
    "max_score" : 11.0,
    "hits" : [
      {
        "_index" : "tracking-user-behavior",
        "_type" : "_doc",
        "_id" : "3",
        "_score" : 11.0
      },
      {
        "_index" : "tracking-user-behavior",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 6.0
      },
      {
        "_index" : "tracking-user-behavior",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 1.0
      }
    ]
  }
}