Default Search Behavior 

Many customers who want to have a search functionality at their application often don't know how it has to work. It is a very typical situation. Let us check how to work with such a customer and what to offer.

Solution

Sphere

There are two approaches you may follow here — minimalistic and advanced.

The minimalistic approach is relatively simple to explain. You need to put all the information that may make sense to a potential user to search index and run the search. The way you put data to search index is not that important as the data will be plain anyway. The most interesting part is how you are going to run the search. 

The recommended way is to enable a full-text search with the standard behavior for the primary language of the application. Most search engines are clever enough and offer standard setups for such a case. Such a setup usually includes:

  • lowercase converter — converts all words to lowercase (Data > data, ORM > orm);

  • stop-word filter — removes meaningless words from the index (a, of, and, in, etc);

  • stemming — removes endings from words (lighter > light, building > build).

Such a simple setup already produces good results, assuming you passed relevant data to the index. This approach is an excellent place to start on the first iteration to demonstrate to your customer basic features and have a foundation for improvement in the future.

The advanced approach assumes that you have to perform a business analysis of typical use cases of application users. Then you have to offer a list of features that may make sense to the application uses. Here are some features that make sense for most customers:

  • result boosting — put some results higher than others;

  • autocomplete — show results immediately when a user types the phrase;

  • autocorrection or error-tolerance — allows to find results even when phrase contains mistakes;

  • full-text fine-tuning — reduce the list of results and keeps only relevant ones;

  • synonyms — allows finding results using different words.

There are many more features depending on the application business area, marketing segments, and many other factors. You have to show the possible customer options, pick ones that ensure the best behavior, and then start working on the implementation.

Implementation

Code

Let us create a simple index, fill it with test data, and find something there.

First, we need to create an index. Let us assume that we have an application that manages articles, and we have to search in title, content, and keywords of the article. All these fields will be concatenated in one index field called data.

Here are two queries, the first one creates an index, the second one sets an index structure called mapping. Pay attention to the data field configuration, it uses a text type with a built-in language analyzer for the English language. This analyzer includes a lowercase converter, stop-word filter, and basic stemming.

curl -X PUT "localhost:9200/default-search" curl -X PUT "localhost:9200/default-search/_mapping" -H 'Content-Type: application/json' -d' { "properties": { "data": { "type": "text", "analyzer": "english" } } } '

The next step is passing data to the index. Let us index the content of three Wikipedia pages and a couple of keywords. We are going to index articles about apple (id=1), orange (id=2), and banana (id=3).

curl -X PUT "localhost:9200/default-search/_doc/1" -H 'Content-Type: application/json' -d' { "data": "Apple An apple is an edible fruit produced by an apple tree (Malus domestica). Apple trees are cultivated worldwide and are the most widely grown species in the genus Malus. fruit sweet red yellow" } ' curl -X PUT "localhost:9200/default-search/_doc/2" -H 'Content-Type: application/json' -d' { "data": "Orange The orange is the fruit of various citrus species in the family Rutaceae (see list of plants known as orange); it primarily refers to Citrus × sinensis,[1] which is also called sweet orange, to distinguish it from the related Citrus × aurantium, referred to as bitter orange. fruit sour orange" } ' curl -X PUT "localhost:9200/default-search/_doc/3" -H 'Content-Type: application/json' -d' { "data": "Banana A banana is an elongated, edible fruit – botanically a berry[1][2] – produced by several kinds of large herbaceous flowering plants in the genus Musa.[3] In some countries, bananas used for cooking may be called \"plantains\", distinguishing them from dessert bananas. fruit sweet yellow" } '

Finally, we can search through the indexes data and find an appropriate article. Let us use the word Country. Please pay attention that the engine has found a document with id=3, and it does not have this word as such, but it has the word countries instead. So here, you can see how the search index applies lowercase converter and stemming.

curl -X GET "localhost:9200/default-search/_search?q=data:Country&pretty"

And here is the result:

{ "took" : 3, "timed_out" : false, "_shards" : { "total" : 1, "successful" : 1, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 1, "relation" : "eq" }, "max_score" : 0.9272294, "hits" : [ { "_index" : "default-search", "_type" : "_doc", "_id" : "3", "_score" : 0.9272294, "_source" : { "data" : "Banana A banana is an elongated, edible fruit – botanically a berry[1][2] – produced by several kinds of large herbaceous flowering plants in the genus Musa.[3] In some countries, bananas used for cooking may be called \"plantains\", distinguishing them from dessert bananas. fruit sweet yellow" } } ] } }
 

Next lesson
Multi-language Search >>