feat(search): Use minimum_should_match to filter hits #1333

orangejulius · 2019-07-15T16:00:42Z

This change utilizes the minimum_should_match Elasticsearch query parameter to reduce the number of hits to search queries. For now, autocomplete is not changed, we can take a look at that later.

Previously, only one token in an input had to match, regardless of the input size. This allows for queries to potentially match a huge number of documents, especially as the number of tokens grows.

Now, most ngrams queries have the minimum_should_match parameter set to 1<-1 3<-25%

Breaking this down, it means that the number of optional tokens follows this pattern:

token count	optional tokens
1	0 (obviously)
2	1
3	1
4	1
5	1
6+	at least 75% of tokens must match

This should help ensure quality results are returned where possible, even if long inputs contain a few extraneous bits of information, while reducing the chance of extremely expensive queries.

This change utilizes the [minimum_should_match](https://www.elastic.co/guide/en/elasticsearch/reference/5.6/query-dsl-minimum-should-match.html) Elasticsearch query parameter to reduce the number of hits to search queries. Previously, only one token in an input had to match, regardless of the input size. This allows for queries to potentially match a huge number of documents, especially as the number of tokens grows. Now, most `ngrams` queries have thee parameter set to `1<-1 3<-25%` Breaking this down, it means that for the given number of tokens in a query, there is a certain number of optional tokens: | token count | optional tokens | | --- | --- | | 1 | 0 (obviously) | | 2 | 1 | | 3 | 1 | | 4 | 1 | | 5 | 1 | | 6+ | at least 75% of tokens must match | This should help ensure quality results are returned where possible, even if long inputs contain a few extraneous bits of information, while reducing the chance of extremely expensive queries.

orangejulius · 2019-07-17T13:24:08Z

We saw some very nice reductions in request latency and number of Elasticsearch hits due to this change. It looks like it will really help cut down on slow queries.

The effect of rolling this out was pretty dramatic:

It fails since pelias/api#1333

orangejulius merged commit 6c5db8d into master Jul 17, 2019

orangejulius deleted the minimum_should_match branch July 17, 2019 13:19

orangejulius added a commit to pelias/acceptance-tests that referenced this pull request Aug 4, 2019

Mark Santa Catarina test case failing

2171df9

It fails since pelias/api#1333

orangejulius mentioned this pull request Aug 13, 2020

Use match instead of match_phrase query for autocomplete #1432

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(search): Use minimum_should_match to filter hits #1333

feat(search): Use minimum_should_match to filter hits #1333

orangejulius commented Jul 15, 2019

orangejulius commented Jul 17, 2019

feat(search): Use minimum_should_match to filter hits #1333

feat(search): Use minimum_should_match to filter hits #1333

Conversation

orangejulius commented Jul 15, 2019

orangejulius commented Jul 17, 2019