From 338a3d4e726875a96d5ad86778c036cb8d77f1b3 Mon Sep 17 00:00:00 2001 From: David Turner Date: Mon, 23 Oct 2017 16:56:14 +0100 Subject: [PATCH] Update numbers to reflect 4-byte UTF-8-encoded characters You need 4 bytes for characters outside the BMP, which includes many emoji and a bunch of less-common writing characters too. --- docs/reference/mapping/params/ignore-above.asciidoc | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/reference/mapping/params/ignore-above.asciidoc b/docs/reference/mapping/params/ignore-above.asciidoc index 6a24ca626d981..2db12a33368a2 100644 --- a/docs/reference/mapping/params/ignore-above.asciidoc +++ b/docs/reference/mapping/params/ignore-above.asciidoc @@ -56,5 +56,5 @@ limit of `32766`. NOTE: The value for `ignore_above` is the _character count_, but Lucene counts bytes. If you use UTF-8 text with many non-ASCII characters, you may want to -set the limit to `32766 / 3 = 10922` since UTF-8 characters may occupy at most -3 bytes. +set the limit to `32766 / 4 = 8191` since UTF-8 characters may occupy at most +4 bytes.