Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Typo in query leads to "no results" #1038

Closed
tadjik1 opened this issue Oct 19, 2017 · 6 comments
Closed

Typo in query leads to "no results" #1038

tadjik1 opened this issue Oct 19, 2017 · 6 comments

Comments

@tadjik1
Copy link
Contributor

tadjik1 commented Oct 19, 2017

Hi there,
We've faced problem with empty list of results that pelias-api returns even with small typo in query. It's pretty easy to reproduce:

  1. open https://mapzen.com/products/search
  2. type "Reichstag"
  3. you can see several results
  4. change "Reichstag" to "Reichstog"
  5. there are no results

It seems that such small typos shouldn't affect result list.

@missinglink
Copy link
Member

missinglink commented Oct 19, 2017

hi @tadjik1, we currently don't support spelling error detection, it's a complex thing to get correct while also ensuring that query performance is not affected.

other services return messages such as 'Showing results for Reichstag. No results found for Reichstog.'

we haven't done any research into this area and likely won't have a solution in the short-term.

off the top of my head there are two approaches for handling spelling mistakes:

index-time permutation generation

in this scenario, the original name is put through an algorithm which generates logical error cases within a certain threshold, based on mental errors (such as vowel substitution) and typing errors (such as pressing an adjacent key).

this is a well-studied domain and many existing algorithms are available to produce these tokens.

the issue with this approach is that the total index can expand to 20x it's size, meaning that a planet-wide index could expand from 1 billion to 20 billion entries, resulting in a severe decrease in search-time performance and disk / ram requirements.

additionally, there may be a negative effect on search quality and some care would need to be taken for things like Freiberg vs Freiburg (for example).

search-time permutation generation

in this scenario, we take the search term and check if it exists in the index.

If the term fails to match then we could run the same algorithm to generate a list of 'fallback terms'.

it would then be possible to iterate through those fallback terms until a match was found.

this approach would not increase the index size but would result in a slower response for queries with spelling errors. This is arguably better because only those queries containing a spelling error would have a decreased response time.

The result would be similar to the fallback message I posted above 'Showing results for Reichstag. No results found for Reichstog.'.


I doubt the core team will get a chance to look at this any time soon, there are a lot of edge cases to consider at planet scale and considering multiple languages.

The good news is that the 'search-time permutation generation' option can be handled by a client library (ever in the browser) or added to the pelias/api codebase and handled there.

If you are interested in doing some work in this area I would be happy to have a discussion with you around how it might work and how we could get a PR merged in to master.

@tadjik1
Copy link
Contributor Author

tadjik1 commented Oct 19, 2017

@missinglink yes, sure. I would like to help with this feature.
do you know nodejs libraries that can handle 1st part of search-time permutation generation, so they can generate logical error cases for request?

@missinglink
Copy link
Member

@missinglink
Copy link
Member

Also worth having a look at the elasticsearch docs, I'm guessing there must be something in place to handle spelling mistakes and it might be easy enough to enable it?

@tadjik1
Copy link
Contributor Author

tadjik1 commented Oct 26, 2017

@missinglink seems that option fuzziness it what we are looking for: https://www.elastic.co/guide/en/elasticsearch/guide/current/fuzzy-match-query.html

I will prepare PR to pelias/query

@orangejulius
Copy link
Member

Closing this one as a duplicate of pelias/pelias#785

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants