Skip to content

Commit

Permalink
Wildcard field use only 3-gram to index (opensearch-project#17349)
Browse files Browse the repository at this point in the history
* support 3gram wildcard

Signed-off-by: gesong.samuel <[email protected]>

* add changelog-3

Signed-off-by: gesong.samuel <[email protected]>

* add rolling upgrade test for wildcard field

Signed-off-by: gesong.samuel <[email protected]>

* remove test case added in opensearch-project#16827

Signed-off-by: gesong.samuel <[email protected]>

---------

Signed-off-by: gesong.samuel <[email protected]>
Co-authored-by: gesong.samuel <[email protected]>
  • Loading branch information
HUSTERGS and gesong.samuel authored Feb 18, 2025
1 parent 91a93da commit e62bf1a
Show file tree
Hide file tree
Showing 7 changed files with 715 additions and 130 deletions.
1 change: 1 addition & 0 deletions CHANGELOG-3.0.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,7 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
- Stop minimizing automata used for case-insensitive matches ([#17268](https://github.com/opensearch-project/OpenSearch/pull/17268))
- Refactor the `:server` module `org.opensearch.client` to `org.opensearch.transport.client` to eliminate top level split packages for JPMS support ([#17272](https://github.com/opensearch-project/OpenSearch/pull/17272))
- Use Lucene `BM25Similarity` as default since the `LegacyBM25Similarity` is marked as deprecated ([#17306](https://github.com/opensearch-project/OpenSearch/pull/17306))
- Wildcard field index only 3gram of the input data [#17349](https://github.com/opensearch-project/OpenSearch/pull/17349)

### Deprecated

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,200 @@
# refactored from rest-api-spec/src/main/resources/rest-api-spec/test/search/270_wildcard_fieldtype_queries.yml
---
"search on mixed state":
# "term query matches exact value"
- do:
search:
index: test
body:
query:
term:
my_field: "AbCd"
- match: { hits.total.value: 1 }
- match: { hits.hits.0._id: "5" }

- do:
search:
index: test
body:
query:
term:
my_field.doc_values: "AbCd"
- match: { hits.total.value: 1 }
- match: { hits.hits.0._id: "5" }

# term query matches lowercase-normalized value
- do:
search:
index: test
body:
query:
term:
my_field.lower: "abcd"
- match: { hits.total.value: 2 }
- match: { hits.hits.0._id: "5" }
- match: { hits.hits.1._id: "7" }

- do:
search:
index: test
body:
query:
term:
my_field.lower: "ABCD"
- match: { hits.total.value: 2 }
- match: { hits.hits.0._id: "5" }
- match: { hits.hits.1._id: "7" }

- do:
search:
index: test
body:
query:
term:
my_field: "abcd"
- match: { hits.total.value: 0 }

# wildcard query matches
- do:
search:
index: test
body:
query:
wildcard:
my_field:
value: "*Node*Exception*"
- match: { hits.total.value: 1 }
- match: { hits.hits.0._id: "1" }

# wildcard query matches lowercase-normalized field
- do:
search:
index: test
body:
query:
wildcard:
my_field.lower:
value: "*node*exception*"
- match: { hits.total.value: 1 }
- match: { hits.hits.0._id: "1" }

- do:
search:
index: test
body:
query:
wildcard:
my_field.lower:
value: "*NODE*EXCEPTION*"
- match: { hits.total.value: 1 }
- match: { hits.hits.0._id: "1" }

- do:
search:
index: test
body:
query:
wildcard:
my_field:
value: "*node*exception*"
- match: { hits.total.value: 0 }

# prefix query matches
- do:
search:
index: test
body:
query:
prefix:
my_field:
value: "[2024-06-08T"
- match: { hits.total.value: 3 }

# regexp query matches
- do:
search:
index: test
body:
query:
regexp:
my_field:
value: ".*06-08.*cluster-manager node.*"
- match: { hits.total.value: 2 }

# regexp query matches lowercase-normalized field
- do:
search:
index: test
body:
query:
regexp:
my_field.lower:
value: ".*06-08.*Cluster-Manager Node.*"
- match: { hits.total.value: 2 }

- do:
search:
index: test
body:
query:
regexp:
my_field:
value: ".*06-08.*Cluster-Manager Node.*"
- match: { hits.total.value: 0 }

# wildcard match-all works
- do:
search:
index: test
body:
query:
wildcard:
my_field:
value: "*"
- match: { hits.total.value: 6 }

# regexp match-all works
- do:
search:
index: test
body:
query:
regexp:
my_field:
value: ".*"
- match: { hits.total.value: 6 }

# terms query on wildcard field matches
- do:
search:
index: test
body:
query:
terms: { my_field: [ "AbCd" ] }
- match: { hits.total.value: 1 }
- match: { hits.hits.0._id: "5" }

# case insensitive query on wildcard field
- do:
search:
index: test
body:
query:
wildcard:
my_field:
value: "AbCd"
- match: { hits.total.value: 1 }
- match: { hits.hits.0._id: "5" }

- do:
search:
index: test
body:
query:
wildcard:
my_field:
value: "AbCd"
case_insensitive: true
- match: { hits.total.value: 2 }
- match: { hits.hits.0._id: "5" }
- match: { hits.hits.1._id: "7" }
Loading

0 comments on commit e62bf1a

Please sign in to comment.