Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Load kernel BTF spec once when creating a new collection #1690

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

patrickpichler
Copy link
Contributor

When creating a new collection without specifying the KernelTypes in the ProgramOptions, the kernel BTF is implicitly loaded. For this the btf.LoadKernelSpec() helper is used. Even though the spec is only loaded once, this operation cause a lot of memory churn, as the kernel spec is being copied each time the function get called. This becomes an issue when a collection with lots of programs is loaded in a resource limited environment.

This commit initializes an missing KernelTypes field with the BTF kernel spec, getting rid of the subsequent calls of BTF.Copy().

This greatly reduces the number of allocations/op when running the BenchmarkNewCollectionManyProgs benchmark. Here are the results:

Previous:
BenchmarkNewCollectionManyProgs-4 12 98443414 ns/op 116779863 B/op 403964 allocs/op

With optimization:
BenchmarkNewCollectionManyProgs-4 184 5807742 ns/op 4134444 B/op 17325 allocs/op

When creating a new collection without specifying the `KernelTypes` in
the `ProgramOptions`, the kernel BTF is implicitly loaded. For this the
`btf.LoadKernelSpec()` helper is used. Even though the spec is only
loaded once, this operation cause a lot of memory churn, as the kernel
spec is being copied each time the function get called. This becomes an
issue when a collection with lots of programs is loaded in a resource
limited environment.

This commit initializes an missing `KernelTypes` field with the BTF
kernel spec, getting rid of the subsequent calls of `BTF.Copy()`.

This greatly reduces the number of allocations/op when running the
`BenchmarkNewCollectionManyProgs` benchmark. Here are the results:

Previous:
BenchmarkNewCollectionManyProgs-4  12 98443414 ns/op 116779863 B/op 403964 allocs/op

With optimization:
BenchmarkNewCollectionManyProgs-4 184  5807742 ns/op   4134444 B/op  17325 allocs/op

Signed-off-by: Patrick Pichler <[email protected]>
@patrickpichler patrickpichler force-pushed the reduce-number-of-allocations-when-creating-new-collection branch from 345d0ed to fed5496 Compare February 17, 2025 18:55
@patrickpichler
Copy link
Contributor Author

@lmb @dylandreimerink I know the test for kernels without BTF support are failing. I guess this can be fixed by lazy loading the kernel spec only once. Before I start refactoring the code though, is this approach something you would merge?

A different approach might be to refactor the btf.Spec type and somehow introduce a immutable version of it. I guess this is not really possible without introducing breaking changes though.

@lmb
Copy link
Collaborator

lmb commented Feb 24, 2025

For this the btf.LoadKernelSpec() helper is used. Even though the spec is only loaded once, this operation cause a lot of memory churn, as the kernel spec is being copied each time the function get called.

IIRC we don't really copy the full spec. It's a shallow, lazy copy. Is that still too expensive?

This commit initializes an missing KernelTypes field with the BTF kernel spec, getting rid of the subsequent calls of BTF.Copy().

The problem with this is that it always causes a parse of kernel BTF, even in cases where the load wouldn't have needed one.

@patrickpichler
Copy link
Contributor Author

IIRC we don't really copy the full spec. It's a shallow, lazy copy. Is that still too expensive?

Ok after looking at the flame graphs again, I think the issue is not directly the copying of the kernel BTF spec, but since each time relocations get applied, they call spec.TypeByID here, which if the BTF spec is shared will short circuit, as the type has already been copied, but for new BTF specs, it will always end up copying the same type over and over again. I misunderstood the copy logic the first time I look at this, causing me to suspect it is at fault, but since the kernels BTF spec always returns a copy, it should never have to deep copy copied types (there should be no copied types).

An alternative could be, to use the type directly from the imm part of the spec. This greatly reduces allocations, but I guess could be dangerous, as changes to types found in the instruction metadata would ripple through. Though I am not sure if there is a valid use case for this (just because I cannot think of it, of course doesn't mean there isn't 😅). I prototyped this here.

What do you think about this solution?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants