Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Increase max robj embedded key + value to 128 bytes #1726

Draft
wants to merge 1 commit into
base: unstable
Choose a base branch
from

Conversation

zuiderkwast
Copy link
Contributor

This change permits string values to be stored embedded in the refcounted object (robj) if the size of robj including embedded key and value fits in 128 bytes. The benefit is less memory usage on average, better cache locality and fewer allocations.

The jemalloc size classes in the range 64-128 bytes are in intervals of 16 bytes. The size classes below 64 come in intervals of 8. Above 128, the size classes come in intervals of 32 bytes. The interesting range for this PR is 64-128 bytes.

  • When the size classes come in intervals of 8, a string of a random size in this interval has 0-7 bytes of wasted space, 3.5 on average. With two such allocations (one for robj and key and one for the value) we have 7 bytes wasted on average (3.5 + 3.5).

  • If the key is ~17 bytes and the value is ~80 bytes, the allocation holding the robj with the key has 0-7 bytes wasted and the value allocation has 0-15 bytes wasted. This is 3.5 + 7.5 = 11 bytes on average.

  • If we embed key and value in one allocation and it fits within 64-128 bytes, it has 0-15 bytes wasted, which is 7.5 bytes on average, and fewer allocations may imply less overhead in the allocator itself, and better locality which may imply faster access.

This needs to be benchmarked though, for memory and CPU.

The total size of key + value to be affected by this change needs to be 41-105 bytes. For example key size 17 and value size 85.

@SoftlyRaining Do you want to test this and compare to unstable?

If we finish your PR (that removes the robj->ptr for EMBSTR) we save another 8 bytes for EMBSTR.

Copy link

codecov bot commented Feb 13, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 71.14%. Comparing base (41a4bc2) to head (169c5a6).
Report is 5 commits behind head on unstable.

Additional details and impacted files
@@             Coverage Diff              @@
##           unstable    #1726      +/-   ##
============================================
+ Coverage     71.10%   71.14%   +0.03%     
============================================
  Files           123      123              
  Lines         65536    65536              
============================================
+ Hits          46599    46623      +24     
+ Misses        18937    18913      -24     
Files with missing lines Coverage Δ
src/object.c 82.19% <100.00%> (ø)

... and 10 files with indirect coverage changes

@SoftlyRaining
Copy link
Contributor

With benchmarking I found that I had to also update OBJ_ENCODING_EMBSTR_SIZE_LIMIT to see an actual difference. I haven't investigated exactly why that was needed, but ideally we'd have the embstr threshold defined in a single location.

@zuiderkwast
Copy link
Contributor Author

Oh, ... , I see this check prevents embedding value if it doesn't already come as an EMBSTR:

robj *objectSetKeyAndExpire(robj *val, sds key, long long expire) {
    if (val->type == OBJ_STRING && val->encoding == OBJ_ENCODING_EMBSTR) {
        robj *new = createStringObjectWithKeyAndExpire(val->ptr, sdslen(val->ptr), key, expire);

That's why OBJ_ENCODING_EMBSTR_SIZE_LIMIT matters.

We change probably increase OBJ_ENCODING_EMBSTR_SIZE_LIMIT += 64 if you don't see any bad results with that.

The alternative is to change the code quoted above to not require val to be an EMBSTR.

@SoftlyRaining
Copy link
Contributor

SoftlyRaining commented Feb 13, 2025

I've done benchmarking to compare this change with unstable at intervals of at least every 4 bytes. Not what I expected! 🤷
image
The zigzag pattern of course is from jemalloc. Key size is 18 bytes. I measured used_memory before and after adding 5 million items of the specified size, restarting the server between each datapoint measured.

@SoftlyRaining
Copy link
Contributor

I tested 85B values and 18B keys with 9 threads and no pipelining. Performance improved 2-6%.

	unstable	emb-128		percent change
get	1405086		1486670		6%
set	1236424		1255064		2%

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants