Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposed fix for exponential performance slowdown due to dependencies count #270

Closed
code-ape opened this issue Oct 24, 2021 · 9 comments · Fixed by #272
Closed

Proposed fix for exponential performance slowdown due to dependencies count #270

code-ape opened this issue Oct 24, 2021 · 9 comments · Fixed by #272

Comments

@code-ape
Copy link

code-ape commented Oct 24, 2021

Background

Hello! I'm a big fan of awilix and recently did a bit of benchmarking to understand the consequences of transitioning from a SINGLETON to a SCOPED setup for our use case. We currently have couple hundreds of dependencies but out of curiosity I set the matrix to run up to 2,000 dependencies. To my astonishment, awilix slowed down EXPONENTIALLY as the number of dependencies increased. I chased down the cause of this and think I found a solution, but didn't know the internals well enough to confidently provide a PR without opening an issue first to get feedback.

Issue Identification

This exponential slowdown was confirmed by a best fit line for exponential slowdown fitting with a near 0.9 R score.

Number of dependencies Throughput (resolutions / second)
100 71,127
200 30,420
1,000 2,098
2,000 547
Click to view graph of data with exponential fitting ![Screen Shot 2021-10-23 at 8 06 30 PM](https://user-images.githubusercontent.com/3579192/138574992-e4ac4117-0970-451a-99c4-07cb125c9fe2.png)

This seemed very bizarre and so I tried running the same test matrix but changed to logic to pull from cache if it was populated -- which is should be every time except the first for the SINGLETON lifetime case.
Fascinatingly, performance improved DRAMATICALLY and became roughly the same regardless of the number of dependencies.

Number of dependencies Throughput with cache (resolutions / second) Performance improvement (%)
100 126,064 ~77%
200 171,940 ~465%
1,000 149,079 ~7,006%
2,000 140,005 ~25,495%

This seemed very off and so I dug into the resolve() function and tracked the issue down to rollUpRegistrations:

awilix/src/container.ts

Lines 351 to 363 in 0c4ee7d

function rollUpRegistrations(bustCache: boolean = false): RegistrationHash {
if (computedRegistrations && !bustCache) {
return computedRegistrations
}
computedRegistrations = {
...(parentContainer &&
(parentContainer as any)[ROLL_UP_REGISTRATIONS](bustCache)),
...registrations,
}
return computedRegistrations!
}

This is called every time a resolution was fired from my test harness because root level resolves busts the registration cache.
A quick look at the compiled JS code revealed the culprit:

computedRegistrations = Object.assign(Object.assign({}, (parentContainer &&
  parentContainer[ROLL_UP_REGISTRATIONS](bustCache))), registrations);

This causes a full copy of registrations with each call which in turn causes the exponential slowdown.

Proposed Solution

Since computed registrations appears to only be a lookup cache, then it appears that for scenarios without parent containers it should just be a reference to registrations. Example:

function rollUpRegistrations(bustCache: boolean = false): RegistrationHash {
  if (computedRegistrations && !bustCache) {
    return computedRegistrations
  }
  if (parentContainer) {
    computedRegistrations = Object.assign({}, parentContainer[ROLL_UP_REGISTRATIONS](bustCache), registrations)
  } else {
    computedRegistrations = registrations
  }
  return computedRegistrations
}

A quick benchmark of this reveals that it achieves the same throughput as calling the cache directly! Meaning this does fix the issue and make awilix screaming fast! I believe that this fully fixes it as rollUpRegistrations was being invoked from the root node so I believe that parentContainer should always be null and thus avoid the full object copy.

Open Questions

The only time this still performance poorly is with container.createScope() which causes a full copy of the parent. I would assume that it's rare that scopes become extremely nested (presumably it's usually a depth of 1 as created for a web request or something of the like). If that's the case, performance may be better with a recursive resolution up the parent list rather than doing a full memory copy of the parent list to the child.

@jeffijoe
Copy link
Owner

jeffijoe commented Oct 24, 2021

Thank you so much! Do you have the test code so I can play around with it sometime?

I tried running the same test matrix but changed to logic to pull from cache if it was populated -- which is should be every time except the first for the SINGLETON lifetime case.

I'm not sure what you mean by this, but if you could show me the test code before and after your changes that would help a lot!

That's definitely a very low-hanging fruit for a perf gain when using the root container! However, as you noted, most people using this with web requests would be using a scope, which means the registrations rollup does have to happen.

I can't seem to remember why the registration cache needs to be busted when resolving the root. 🤔

Would be very interesting to see if a recursive resolution is faster! I imagine it would be, especially with just a single parent. One thing to be careful of here though is where the resolved value is cached when SCOPED. See the release notes for V4 for info on that.

@code-ape
Copy link
Author

code-ape commented Oct 24, 2021

Thanks for the quick reply @jeffijoe! To answer your questions:

Code I was referring to

I was just manually pulling from the cache if it was populated. Example:

let cacheCheck = container.cache.get('myService')
if (cacheCheck) {
  dep = cacheCheck.value
} else {
  dep = container.resolve('mySerivce')
}

Why to bust when resolving the root?

From my scanning through the code it seems that busting on resolving root is the only way that computedRegistrations is set right now. This seems to combine initialization with cache invalidation. Which need to be split apart. If you were to do recursive resolution I think computedRegistrations goes away because the container knows its scoped dependencies and would check those first before checking parents.

Recursive resolution

I have no doubt it will be! And I'm fairly sure there's a straight forward way to do it while honoring the scope cache. There will probably need to be a bit of "immutable data structure" style work done to prevent the below scenario. Though it may be a bit overkill and a rare edge condition for the project's concern (I deal a lot with concurrency stuff and have thus become paranoid to just race scenarios lol). Let me know what you think!

Scenario: You create a container with dependency "D1" as value "V1" and then create a scope, the scope runs a "long lived" action which resolves "D1" at some point, half way through the long lived action you mutate the "D1" to have value "V2". The scope needs to have a "snapshot" style where "D1" will always resolve to "V1" over its lifetime else there could be a scenario where "V1" is used in part of it and "V2" in a different part.

@jeffijoe
Copy link
Owner

If you could post the benchmarking code that would be very helpful 🙏

@code-ape
Copy link
Author

code-ape commented Oct 24, 2021

@jeffijoe of course! Just uploaded it with a copy of my results: https://github.com/urbdyn/awilix-benchmark

EDIT: Sorry for missing that in your original message! Been working all day, probably have a ton of typos in here. Anyways, if you're making changes and re-running the benchmark, you can narrow it down to the following matrix which should run in ~20 seconds:

const testMatrix = {
  // Note each service creates two dependencies.
  numberServices: [1_000],
  maxInFlight: [1_000],
  lifetimeType: ['TRANSIENT', 'SCOPED', 'SINGLETON'],
  cacheBypass: [true, false],
}

@code-ape code-ape changed the title Proposed fix for exponentially performance slowdown due to dependencies count Proposed fix for exponential performance slowdown due to dependencies count Oct 24, 2021
@jeffijoe
Copy link
Owner

jeffijoe commented Oct 25, 2021

@code-ape thanks! What numbers in the results should I be looking for improvement in? Basically I would like to be able to run a benchmark, record some key numbers, make a change, run again, and compare.

Is it resolvedResults I need to look at? Higher is better?

I also think computedRegistrations needs to remain as the registrations getter which is public API does use it. However, the resolution path can definitely benefit from optimization.

@code-ape
Copy link
Author

code-ape commented Oct 25, 2021

@jeffijoe the best measurement would be requestNumber as each request triggers a resolve (and a scope creation for SCOPED test scenarios). If you want to get average time to resolve for a given scenario, do requestNumber / totalSecondsResolving.

Let me know if I can be of any more help!

@jeffijoe
Copy link
Owner

@code-ape here are the results from using a recursive getRegistration rather than rolling up.

[
  [
    {
      "numberServices": 1000,
      "maxInFlight": 1000,
      "lifetimeType": "TRANSIENT",
      "cacheBypass": true,
      "numberSeconds": 3
    },
    {
      "maxInFlightHit": true,
      "requestNumber": 1504140,
      "resolveCacheHits": 0,
      "numberLogs": 3008280,
      "resolvedResults": 1504139,
      "totalSeconds": 3.001,
      "totalSecondsResolving": 1.032,
      "totalDiInitializations": 6016560
    }
  ],
  [
    {
      "numberServices": 1000,
      "maxInFlight": 1000,
      "lifetimeType": "TRANSIENT",
      "cacheBypass": false,
      "numberSeconds": 3
    },
    {
      "maxInFlightHit": true,
      "requestNumber": 1416079,
      "resolveCacheHits": 0,
      "numberLogs": 2832158,
      "resolvedResults": 1416078,
      "totalSeconds": 3.001,
      "totalSecondsResolving": 1.139,
      "totalDiInitializations": 5664316
    }
  ],
  [
    {
      "numberServices": 1000,
      "maxInFlight": 1000,
      "lifetimeType": "SCOPED",
      "cacheBypass": true,
      "numberSeconds": 3
    },
    {
      "maxInFlightHit": true,
      "requestNumber": 467109,
      "resolveCacheHits": 0,
      "numberLogs": 934218,
      "resolvedResults": 467108,
      "totalSeconds": 3.001,
      "totalSecondsResolving": 0.9,
      "totalDiInitializations": 1401327
    }
  ],
  [
    {
      "numberServices": 1000,
      "maxInFlight": 1000,
      "lifetimeType": "SCOPED",
      "cacheBypass": false,
      "numberSeconds": 3
    },
    {
      "maxInFlightHit": true,
      "requestNumber": 460484,
      "resolveCacheHits": 0,
      "numberLogs": 920968,
      "resolvedResults": 460483,
      "totalSeconds": 3.001,
      "totalSecondsResolving": 0.92,
      "totalDiInitializations": 1381452
    }
  ],
  [
    {
      "numberServices": 1000,
      "maxInFlight": 1000,
      "lifetimeType": "SINGLETON",
      "cacheBypass": true,
      "numberSeconds": 3
    },
    {
      "maxInFlightHit": true,
      "requestNumber": 1921261,
      "resolveCacheHits": 1920261,
      "numberLogs": 3842522,
      "resolvedResults": 1921260,
      "totalSeconds": 3.001,
      "totalSecondsResolving": 0.459,
      "totalDiInitializations": 2001
    }
  ],
  [
    {
      "numberServices": 1000,
      "maxInFlight": 1000,
      "lifetimeType": "SINGLETON",
      "cacheBypass": false,
      "numberSeconds": 3
    },
    {
      "maxInFlightHit": true,
      "requestNumber": 1738407,
      "resolveCacheHits": 0,
      "numberLogs": 3476814,
      "resolvedResults": 1738406,
      "totalSeconds": 3.001,
      "totalSecondsResolving": 0.586,
      "totalDiInitializations": 2001
    }
  ]
]

@code-ape
Copy link
Author

@jeffijoe NICEEEEE!!! That's now a half million resolutions per compute-second (460484 / 0.92) for scoped requests with 2000 dependencies. Which is a 1000x improvement 😄 and singleton is doing a crazy three million resolutions per compute-seconds. Thanks so much for working on this!

@jeffijoe
Copy link
Owner

Sick 😎

This was a very nice find! I'm still working on some other changes for v6 but it should be released soon.

jeffijoe added a commit that referenced this issue Oct 30, 2021
* Uses a recursive lookup to get the registration
 being resolved rather than doing a full copy of
 the registrations. This improves resolve
 performance by up to 1,000x in some cases
* Add `getRegistration` to container

Closes #270
jeffijoe added a commit that referenced this issue Oct 30, 2021
* Uses a recursive lookup to get the registration
 being resolved rather than doing a full copy of
 the registrations. This improves resolve
 performance by up to 1,000x in some cases
* Add `getRegistration` to container

Closes #270
@jeffijoe jeffijoe mentioned this issue Oct 31, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants