-
Notifications
You must be signed in to change notification settings - Fork 721
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
prog: avoid verifier loop of death #1693
base: main
Are you sure you want to change the base?
Conversation
This commit special-cases LogDisabled into a clause without a retry loop since it doesn't need a buffer allocated. It also simplifies the logic somewhat. A follow-up commit will add another separate clause for a new LogSizeOverride CollectionOption to factor out even more complexity related to buffer sizing. The main purpose of this commit is to extract the decision-making logic around retries. Instead of directly manipulating loop control flow, a function is introduced that issues a retry verdict, and it can be tested indendently of the program load action. Signed-off-by: Timo Beckers <[email protected]>
Previously, loading a program would loop forever if the log didn't fit in the kernel's max log size. Signed-off-by: Timo Beckers <[email protected]>
To limit the damage of any future regressions, limit the attempts we'll take to load a program. Signed-off-by: Timo Beckers <[email protected]>
|
||
// ENOSPC means we've loaded the program before and the log buffer was too | ||
// small. | ||
if attr.LogSize == maxVerifierLogSize { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure you will ever reach this condition, because I don't think the kernel will return ENOSPC
when using the max verifier log size. That said, it doesn't hurt to be defensive here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested and did not receive the same infinite loop we were seeing before.
That tells me that the current approach just doesn't work very well and we should try to do better. I have a firm belief that making log size at all configurable was a mistake from the get go. So: how large is "often quite large"? |
break | ||
} | ||
attempts := 1 | ||
for { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit:
for { | |
for attempts := 0; attemps < maxVerifierAttempts ; attempts++ { |
Putting the increment into the for loop has the benefit that it does the right thing when continuing, etc.
|
||
// Ensure the size doesn't overflow. | ||
const factor = 2 | ||
if attr.LogSize >= maxVerifierLogSize/factor { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find this whole factor thing not very intuitive. This is to avoid overflow?
// ENOSPC means the log was enabled on the previous iteration, so we only | ||
// need to grow the buffer. | ||
if errors.Is(err, unix.ENOSPC) { | ||
if attr.LogTrueSize != 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't preserve the behaviour that we don't retry if LogSize >= LogTrueSize. I'm also not sure why this only happens on ENOSPC? LogTrueSize should always be populated?
|
||
if opts.LogDisabled { | ||
break | ||
return &Program{"", fd, spec.Name, "", spec.Type}, nil | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can return here, all of the other code only deals with the log being present. Then the else can go and we can unindent by one leve.
Supersedes #1679.
@lmb I eventually decided against deprecating LogSizeStart because it has utility beyond just break-glass cases, for example to reduce the amount of attempts the Cilium verifier tests take to obtain the full buffer, which is often quite large. We don't want to hardcode a buffer size there because the size varies significantly across programs and the environment is somewhat resource-constrained. It also ended up being more code; as of this proposal it's quite concise and testable, and I'm happy with it.